Introduction to Data Science: Data Analysis and Prediction Algorithms with R

دانلود کتاب Introduction to Data Science: Data Analysis and Prediction Algorithms with R

49000 تومان موجود

کتاب مقدمه ای بر علم داده: تحلیل داده ها و الگوریتم های پیش بینی با R نسخه زبان اصلی

دانلود کتاب مقدمه ای بر علم داده: تحلیل داده ها و الگوریتم های پیش بینی با R بعد از پرداخت مقدور خواهد بود
توضیحات کتاب در بخش جزئیات آمده است و می توانید موارد را مشاهده فرمایید


این کتاب نسخه اصلی می باشد و به زبان فارسی نیست.


امتیاز شما به این کتاب (حداقل 1 و حداکثر 5):

امتیاز کاربران به این کتاب:        تعداد رای دهنده ها: 8


توضیحاتی در مورد کتاب Introduction to Data Science: Data Analysis and Prediction Algorithms with R

نام کتاب : Introduction to Data Science: Data Analysis and Prediction Algorithms with R
عنوان ترجمه شده به فارسی : مقدمه ای بر علم داده: تحلیل داده ها و الگوریتم های پیش بینی با R
سری : data science series
نویسندگان :
ناشر : CRC Press
سال نشر : 2020
تعداد صفحات : 0
ISBN (شابک) : 9781000708035 , 9780429341830
زبان کتاب : English
فرمت کتاب : epub    درصورت درخواست کاربر به PDF تبدیل می شود
حجم کتاب : 20 مگابایت



بعد از تکمیل فرایند پرداخت لینک دانلود کتاب ارائه خواهد شد. درصورت ثبت نام و ورود به حساب کاربری خود قادر خواهید بود لیست کتاب های خریداری شده را مشاهده فرمایید.


فهرست مطالب :


Cover
Half Title
Series Page
Title Page
Copyright Page
Table of Contents
Preface
Acknowledgments
Introduction
1 Getting started with R and RStudio
1.1 Why R?
1.2 The R console
1.3 Scripts
1.4 RStudio
1.4.1 The panes
1.4.2 Key bindings
1.4.3 Running commands while editing scripts
1.4.4 Changing global options
1.5 Installing R packages
I R
2 R basics
2.1 Case study: US Gun Murders
2.2 The very basics
2.2.1 Objects
2.2.2 The workspace
2.2.3 Functions
2.2.4 Other prebuilt objects
2.2.5 Variable names
2.2.6 Saving your workspace
2.2.7 Motivating scripts
2.2.8 Commenting your code
2.3 Exercises
2.4 Data types
2.4.1 Data frames
2.4.2 Examining an object
2.4.3 The accessor: $
2.4.4 Vectors: numerics, characters, and logical
2.4.5 Factors
2.4.6 Lists
2.4.7 Matrices
2.5 Exercises
2.6 Vectors
2.6.1 Creating vectors
2.6.2 Names
2.6.3 Sequences
2.6.4 Subsetting
2.7 Coercion
2.7.1 Not availables (NA)
2.8 Exercises
2.9 Sorting
2.9.1 sort
2.9.2 order
2.9.3 max and which.max
2.9.4 rank
2.9.5 Beware of recycling
2.10 Exercises
2.11 Vector arithmetics
2.11.1 Rescaling a vector
2.11.2 Two vectors
2.12 Exercises
2.13 Indexing
2.13.1 Subsetting with logicals
2.13.2 Logical operators
2.13.3 which
2.13.4 match
2.13.5 %in%
2.14 Exercises
2.15 Basic plots
2.15.1 plot
2.15.2 hist
2.15.3 boxplot
2.15.4 image
2.16 Exercises
3 Programming basics
3.1 Conditional expressions
3.2 Defining functions
3.3 Namespaces
3.4 For-loops
3.5 Vectorization and functionals
3.6 Exercises
4 The tidyverse
4.1 Tidy data
4.2 Exercises
4.3 Manipulating data frames
4.3.1 Adding a column with mutate
4.3.2 Subsetting with filter
4.3.3 Selecting columns with select
4.4 Exercises
4.5 The pipe: %>%
4.6 Exercises
4.7 Summarizing data
4.7.1 summarize
4.7.2 pull
4.7.3 Group then summarize with group_by
4.8 Sorting data frames
4.8.1 Nested sorting
4.8.2 The top n
4.9 Exercises
4.10 Tibbles
4.10.1 Tibbles display better
4.10.2 Subsets of tibbles are tibbles
4.10.3 Tibbles can have complex entries
4.10.4 Tibbles can be grouped
4.10.5 Create a tibble using tibble instead of data.frame
4.11 The dot operator
4.12 do
4.13 The purrr package
4.14 Tidyverse conditionals
4.14.1 case_when
4.14.2 between
4.15 Exercises
5 Importing data
5.1 Paths and the working directory
5.1.1 The filesystem
5.1.2 Relative and full paths
5.1.3 The working directory
5.1.4 Generating path names
5.1.5 Copying files using paths
5.2 The readr and readxl packages
5.2.1 readr
5.2.2 readxl
5.3 Exercises
5.4 Downloading files
5.5 R-base importing functions
5.5.1 scan
5.6 Text versus binary files
5.7 Unicode versus ASCII
5.8 Organizing data with spreadsheets
5.9 Exercises
II Data Visualization
6 Introduction to data visualization
7 ggplot2
7.1 The components of a graph
7.2 ggplot objects
7.3 Geometries
7.4 Aesthetic mappings
7.5 Layers
7.5.1 Tinkering with arguments
7.6 Global versus local aesthetic mappings
7.7 Scales
7.8 Labels and titles
7.9 Categories as colors
7.10 Annotation, shapes, and adjustments
7.11 Add-on packages
7.12 Putting it all together
7.13 Quick plots with qplot
7.14 Grids of plots
7.15 Exercises
8 Visualizing data distributions
8.1 Variable types
8.2 Case study: describing student heights
8.3 Distribution function
8.4 Cumulative distribution functions
8.5 Histograms
8.6 Smoothed density
8.6.1 Interpreting the y-axis
8.6.2 Densities permit stratification
8.7 Exercises
8.8 The normal distribution
8.9 Standard units
8.10 Quantile-quantile plots
8.11 Percentiles
8.12 Boxplots
8.13 Stratification
8.14 Case study: describing student heights (continued)
8.15 Exercises
8.16 ggplot2 geometries
8.16.1 Barplots
8.16.2 Histograms
8.16.3 Density plots
8.16.4 Boxplots
8.16.5 QQ-plots
8.16.6 Images
8.16.7 Quick plots
8.17 Exercises
9 Data visualization in practice
9.1 Case study: new insights on poverty
9.1.1 Hans Rosling’s quiz
9.2 Scatterplots
9.3 Faceting
9.3.1 facet_wrap
9.3.2 Fixed scales for better comparisons
9.4 Time series plots
9.4.1 Labels instead of legends
9.5 Data transformations
9.5.1 Log transformation
9.5.2 Which base?
9.5.3 Transform the values or the scale?
9.6 Visualizing multimodal distributions
9.7 Comparing multiple distributions with boxplots and ridge plots
9.7.1 Boxplots
9.7.2 Ridge plots
9.7.3 Example: 1970 versus 2010 income distributions
9.7.4 Accessing computed variables
9.7.5 Weighted densities
9.8 The ecological fallacy and importance of showing the data
9.8.1 Logistic transformation
9.8.2 Show the data
10 Data visualization principles
10.1 Encoding data using visual cues
10.2 Know when to include 0
10.3 Do not distort quantities
10.4 Order categories by a meaningful value
10.5 Show the data
10.6 Ease comparisons
10.6.1 Use common axes
10.6.2 Align plots vertically to see horizontal changes and horizontally to see vertical changes
10.6.3 Consider transformations
10.6.4 Visual cues to be compared should be adjacent
10.6.5 Use color
10.7 Think of the color blind
10.8 Plots for two variables
10.8.1 Slope charts
10.8.2 Bland-Altman plot
10.9 Encoding a third variable
10.10 Avoid pseudo-three-dimensional plots
10.11 Avoid too many significant digits
10.12 Know your audience
10.13 Exercises
10.14 Case study: vaccines and infectious diseases
10.15 Exercises
11 Robust summaries
11.1 Outliers
11.2 Median
11.3 The inter quartile range (IQR)
11.4 Tukey’s definition of an outlier
11.5 Median absolute deviation
11.6 Exercises
11.7 Case study: self-reported student heights
III Statistics with R
12 Introduction to statistics with R
13 Probability
13.1 Discrete probability
13.1.1 Relative frequency
13.1.2 Notation
13.1.3 Probability distributions
13.2 Monte Carlo simulations for categorical data
13.2.1 Setting the random seed
13.2.2 With and without replacement
13.3 Independence
13.4 Conditional probabilities
13.5 Addition and multiplication rules
13.5.1 Multiplication rule
13.5.2 Multiplication rule under independence
13.5.3 Addition rule
13.6 Combinations and permutations
13.6.1 Monte Carlo example
13.7 Examples
13.7.1 Monty Hall problem
13.7.2 Birthday problem
13.8 Infinity in practice
13.9 Exercises
13.10 Continuous probability
13.11 Theoretical continuous distributions
13.11.1 Theoretical distributions as approximations
13.11.2 The probability density
13.12 Monte Carlo simulations for continuous variables
13.13 Continuous distributions
13.14 Exercises
14 Random variables
14.1 Random variables
14.2 Sampling models
14.3 The probability distribution of a random variable
14.4 Distributions versus probability distributions
14.5 Notation for random variables
14.6 The expected value and standard error
14.6.1 Population SD versus the sample SD
14.7 Central Limit Theorem
14.7.1 How large is large in the Central Limit Theorem?
14.8 Statistical properties of averages
14.9 Law of large numbers
14.9.1 Misinterpreting law of averages
14.10 Exercises
14.11 Case study: The Big Short
14.11.1 Interest rates explained with chance model
14.11.2 The Big Short
14.12 Exercises
15 Statistical inference
15.1 Polls
15.1.1 The sampling model for polls
15.2 Populations, samples, parameters, and estimates
15.2.1 The sample average
15.2.2 Parameters
15.2.3 Polling versus forecasting
15.2.4 Properties of our estimate: expected value and standard error
15.3 Exercises
15.4 Central Limit Theorem in practice
15.4.1 A Monte Carlo simulation
15.4.2 The spread
15.4.3 Bias: why not run a very large poll?
15.5 Exercises
15.6 Confidence intervals
15.6.1 A Monte Carlo simulation
15.6.2 The correct language
15.7 Exercises
15.8 Power
15.9 p-values
15.10 Association tests
15.10.1 Lady Tasting Tea
15.10.2 Two-by-two tables
15.10.3 Chi-square Test
15.10.4 The odds ratio
15.10.5 Confidence intervals for the odds ratio
15.10.6 Small count correction
15.10.7 Large samples, small p-values
15.11 Exercises
16 Statistical models
16.1 Poll aggregators
16.1.1 Poll data
16.1.2 Pollster bias
16.2 Data-driven models
16.3 Exercises
16.4 Bayesian statistics
16.4.1 Bayes theorem
16.5 Bayes theorem simulation
16.5.1 Bayes in practice
16.6 Hierarchical models
16.7 Exercises
16.8 Case study: election forecasting
16.8.1 Bayesian approach
16.8.2 The general bias
16.8.3 Mathematical representations of models
16.8.4 Predicting the electoral college
16.8.5 Forecasting
16.9 Exercises
16.10 The t-distribution
17 Regression
17.1 Case study: is height hereditary?
17.2 The correlation coefficient
17.2.1 Sample correlation is a random variable
17.2.2 Correlation is not always a useful summary
17.3 Conditional expectations
17.4 The regression line
17.4.1 Regression improves precision
17.4.2 Bivariate normal distribution (advanced)
17.4.3 Variance explained
17.4.4 Warning: there are two regression lines
17.5 Exercises
18 Linear models
18.1 Case study: Moneyball
18.1.1 Sabermetics
18.1.2 Baseball basics
18.1.3 No awards for BB
18.1.4 Base on balls or stolen bases?
18.1.5 Regression applied to baseball statistics
18.2 Confounding
18.2.1 Understanding confounding through stratification
18.2.2 Multivariate regression
18.3 Least squares estimates
18.3.1 Interpreting linear models
18.3.2 Least Squares Estimates (LSE)
18.3.3 The lm function
18.3.4 LSE are random variables
18.3.5 Predicted values are random variables
18.4 Exercises
18.5 Linear regression in the tidyverse
18.5.1 The broom package
18.6 Exercises
18.7 Case study: Moneyball (continued)
18.7.1 Adding salary and position information
18.7.2 Picking nine players
18.8 The regression fallacy
18.9 Measurement error models
18.10 Exercises
19 Association is not causation
19.1 Spurious correlation
19.2 Outliers
19.3 Reversing cause and effect
19.4 Confounders
19.4.1 Example: UC Berkeley admissions
19.4.2 Confounding explained graphically
19.4.3 Average after stratifying
19.5 Simpson’s paradox
19.6 Exercises
IV Data Wrangling
20 Introduction to data wrangling
21 Reshaping data
21.1 gather
21.2 spread
21.3 separate
21.4 unite
21.5 Exercises
22 Joining tables
22.1 Joins
22.1.1 Left join
22.1.2 Right join
22.1.3 Inner join
22.1.4 Full join
22.1.5 Semi join
22.1.6 Anti join
22.2 Binding
22.2.1 Binding columns
22.2.2 Binding by rows
22.3 Set operators
22.3.1 Intersect
22.3.2 Union
22.3.3 setdiff
22.3.4 setequal
22.4 Exercises
23 Web scraping
23.1 HTML
23.2 The rvest package
23.3 CSS selectors
23.4 JSON
23.5 Exercises
24 String processing
24.1 The stringr package
24.2 Case study 1: US murders data
24.3 Case study 2: self-reported heights
24.4 How to escape when defining strings
24.5 Regular expressions
24.5.1 Strings are a regexp
24.5.2 Special characters
24.5.3 Character classes
24.5.4 Anchors
24.5.5 Quantifiers
24.5.6 White space \\s
24.5.7 Quantifiers: *, ?, +
24.5.8 Not
24.5.9 Groups
24.6 Search and replace with regex
24.6.1 Search and replace using groups
24.7 Testing and improving
24.8 Trimming
24.9 Changing lettercase
24.10 Case study 2: self-reported heights (continued)
24.10.1 The extract function
24.10.2 Putting it all together
24.11 String splitting
24.12 Case study 3: extracting tables from a PDF
24.13 Recoding
24.14 Exercises
25 Parsing dates and times
25.1 The date data type
25.2 The lubridate package
25.3 Exercises
26 Text mining
26.1 Case study: Trump tweets
26.2 Text as data
26.3 Sentiment analysis
26.4 Exercises
V Machine Learning
27 Introduction to machine learning
27.1 Notation
27.2 An example
27.3 Exercises
27.4 Evaluation metrics
27.4.1 Training and test sets
27.4.2 Overall accuracy
27.4.3 The confusion matrix
27.4.4 Sensitivity and specificity
27.4.5 Balanced accuracy and F1 score
27.4.6 Prevalence matters in practice
27.4.7 ROC and precision-recall curves
27.4.8 The loss function
27.5 Exercises
27.6 Conditional probabilities and expectations
27.6.1 Conditional probabilities
27.6.2 Conditional expectations
27.6.3 Conditional expectation minimizes squared loss function
27.7 Exercises
27.8 Case study: is it a 2 or a 7?
28 Smoothing
28.1 Bin smoothing
28.2 Kernels
28.3 Local weighted regression (loess)
28.3.1 Fitting parabolas
28.3.2 Beware of default smoothing parameters
28.4 Connecting smoothing to machine learning
28.5 Exercises
29 Cross validation
29.1 Motivation with k-nearest neighbors
29.1.1 Over-training
29.1.2 Over-smoothing
29.1.3 Picking the k in kNN
29.2 Mathematical description of cross validation
29.3 K-fold cross validation
29.4 Exercises
29.5 Bootstrap
29.6 Exercises
30 The caret package
30.1 The caret train functon
30.2 Cross validation
30.3 Example: fitting with loess
31 Examples of algorithms
31.1 Linear regression
31.1.1 The predict function
31.2 Exercises
31.3 Logistic regression
31.3.1 Generalized linear models
31.3.2 Logistic regression with more than one predictor
31.4 Exercises
31.5 k-nearest neighbors
31.6 Exercises
31.7 Generative models
31.7.1 Naive Bayes
31.7.2 Controlling prevalence
31.7.3 Quadratic discriminant analysis
31.7.4 Linear discriminant analysis
31.7.5 Connection to distance
31.8 Case study: more than three classes
31.9 Exercises
31.10 Classification and regression trees (CART)
31.10.1 The curse of dimensionality
31.10.2 CART motivation
31.10.3 Regression trees
31.10.4 Classification (decision) trees
31.11 Random forests
31.12 Exercises
32 Machine learning in practice
32.1 Preprocessing
32.2 k-nearest neighbor and random forest
32.3 Variable importance
32.4 Visual assessments
32.5 Ensembles
32.6 Exercises
33 Large datasets
33.1 Matrix algebra
33.1.1 Notation
33.1.2 Converting a vector to a matrix
33.1.3 Row and column summaries
33.1.4 apply
33.1.5 Filtering columns based on summaries
33.1.6 Indexing with matrices
33.1.7 Binarizing the data
33.1.8 Vectorization for matrices
33.1.9 Matrix algebra operations
33.2 Exercises
33.3 Distance
33.3.1 Euclidean distance
33.3.2 Distance in higher dimensions
33.3.3 Euclidean distance example
33.3.4 Predictor space
33.3.5 Distance between predictors
33.4 Exercises
33.5 Dimension reduction
33.5.1 Preserving distance
33.5.2 Linear transformations (advanced)
33.5.3 Orthogonal transformations (advanced)
33.5.4 Principal component analysis
33.5.5 Iris example
33.5.6 MNIST example
33.6 Exercises
33.7 Recommendation systems
33.7.1 Movielens data
33.7.2 Recommendation systems as a machine learning challenge
33.7.3 Loss function
33.7.4 A first model
33.7.5 Modeling movie effects
33.7.6 User effects
33.8 Exercises
33.9 Regularization
33.9.1 Motivation
33.9.2 Penalized least squares
33.9.3 Choosing the penalty terms
33.10 Exercises
33.11 Matrix factorization
33.11.1 Factors analysis
33.11.2 Connection to SVD and PCA
33.12 Exercises
34 Clustering
34.1 Hierarchical clustering
34.2 k-means
34.3 Heatmaps
34.4 Filtering features
34.5 Exercises
VI Productivity Tools
35 Introduction to productivity tools
36 Organizing with Unix
36.1 Naming convention
36.2 The terminal
36.3 The filesystem
36.3.1 Directories and subdirectories
36.3.2 The home directory
36.3.3 Working directory
36.3.4 Paths
36.4 Unix commands
36.4.1 ls: Listing directory content
36.4.2 mkdir and rmdir: make and remove a directory
36.4.3 cd: navigating the filesystem by changing directories
36.5 Some examples
36.6 More Unix commands
36.6.1 mv: moving files
36.6.2 cp: copying files
36.6.3 rm: removing files
36.6.4 less: looking at a file
36.7 Preparing for a data science project
36.8 Advanced Unix
36.8.1 Arguments
36.8.2 Getting help
36.8.3 Pipes
36.8.4 Wild cards
36.8.5 Environment variables
36.8.6 Shells
36.8.7 Executables
36.8.8 Permissions and file types
36.8.9 Commands you should learn
36.8.10 File manipulation in R
37 Git and GitHub
37.1 Why use Git and GitHub?
37.2 GitHub accounts
37.3 GitHub repositories
37.4 Overview of Git
37.4.1 Clone
37.5 Initializing a Git directory
37.6 Using Git and GitHub in RStudio
38 Reproducible projects with RStudio and R markdown
38.1 RStudio projects
38.2 R markdown
38.2.1 The header
38.2.2 R code chunks
38.2.3 Global options
38.2.4 knitR
38.2.5 More on R markdown
38.3 Organizing a data science project
38.3.1 Create directories in Unix
38.3.2 Create an RStudio project
38.3.3 Edit some R scripts
38.3.4 Create some more directories using Unix
38.3.5 Add a README file
38.3.6 Initializing a Git directory
38.3.7 Add, commit, and push files using RStudio
Index




پست ها تصادفی