توضیحاتی در مورد کتاب Data Analytics for the Social Sciences: Applications in R
نام کتاب : Data Analytics for the Social Sciences: Applications in R
ویرایش : 1
عنوان ترجمه شده به فارسی : تجزیه و تحلیل داده ها برای علوم اجتماعی: برنامه های کاربردی در R
سری :
نویسندگان : G. David Garson
ناشر : Routledge
سال نشر : 2021
تعداد صفحات : 705
ISBN (شابک) : 036762429X , 9780367624293
زبان کتاب : English
فرمت کتاب : pdf
حجم کتاب : 24 مگابایت
بعد از تکمیل فرایند پرداخت لینک دانلود کتاب ارائه خواهد شد. درصورت ثبت نام و ورود به حساب کاربری خود قادر خواهید بود لیست کتاب های خریداری شده را مشاهده فرمایید.
فهرست مطالب :
Cover
Half Title
Title Page
Copyright Page
Dedication
Contents
Acknowledgments
Preface
1. Using and abusing data analytics in social science
1.1. Introduction
1.2. The promise of data analytics for social science
1.2.1. Data analytics in public affairs and public policy
1.2.2. Data analytics in the social sciences
1.2.3. Data analytics in the humanities
1.3. Research design issues in data analytics
1.3.1. Beware the true believer
1.3.2. Pseudo-objectivity in data analytics
1.3.3. The bias of scholarship based on algorithms using big data
1.3.4. The subjectivity of algorithms
1.3.5. Big data and big noise
1.3.6. Limitations of the leading data science dissemination models
1.4. Social and ethical issues in data analytics
1.4.1. Types of ethical issues in data analytics
1.4.2. Bias toward the privileged
1.4.3. Discrimination
1.4.4. Diversity and data analytics
1.4.5. Distortion of democratic processes
1.4.6. Undermining of professional ethics
1.4.7. Privacy, profiling, and surveillance issues
1.4.8. The transparency issue
1.5. Summary: Technology and power
Endnotes
2. Statistical analytics with R, Part 1
PART I: OVERVIEW OF STATISTICAL ANALYSIS WITH R
2.1. Introduction
2.2. Data and packages used in this chapter
2.2.1. Example data
2.2.2. R packages used
PART II: QUICK START ON STATISTICAL ANALYSIS WITH R
2.3. Descriptive statistics
2.4. Linear multiple regression
PART III: STATISTICAL ANALYSIS WITH R IN DETAIL
2.5. Hypothesis testing
2.5.1. One-sample test of means
2.5.2. Means test for two independent samples
2.5.3. Means test for two dependent samples
2.6. Crosstabulation, significance, and association
2.7. Loglinear analysis for categorical variables
2.8. Correlation, correlograms, and scatterplots
2.9. Factor analysis (exploratory)
2.10. Multidimensional scaling
2.11. Reliability analysis
2.11.1. Cronbach’s alpha and Guttman’s lower bounds
2.11.2. Guttman’s lower bounds and Cronbach’s alpha
2.11.3. Krippendorff’s alpha and Cohen’s kappa
2.12. Cluster analysis
2.12.1. Hierarchical cluster analysis
2.12.2. K-means clustering
2.12.3. Nearest neighbor analysis
2.13. Analysis of variance
2.13.1. Data and packages used
2.13.2. GLM univariate: ANOVA
2.13.3. GLM univariate: ANCOVA
2.13.4. GLM multivariate: MANOVA
2.13.5. GLM multivariate: MANCOVA
2.14. Logistic regression
2.14.1. ROC and AUC analysis
2.14.2. Confusion table and accuracy
2.15. Mediation and moderation
2.16. Chapter 2 command summary
Endnotes
3. Statistical analytics with R, Part 2
PART I: OVERVIEW OF STATISTICAL ANALYTICS WITH R
3.1. Introduction
3.2. Data and packages used in this chapter
3.2.1. Example data
3.2.2. R Packages used
PART II: QUICK START ON STATISTICAL ANALYSIS PART 2
3.3. Quick start: Linear regression as a generalized linear modeling (GZLM)
3.3.1. Background to GZLM
3.3.2. The linear model in glm()
3.3.3. GZLM output
3.3.4. Fitted value, residuals, and plots
3.3.5. Noncanonical custom links
3.3.6. Multiple comparison tests
3.3.7. Estimated marginal means (EMM)
3.4. Quick start: Testing if multilevel modeling is needed
PART III: STATISTICAL ANALYSIS, PART 2, IN DETAIL
3.5. Generalized linear models (GZLM)
3.5.1. Introduction
3.5.2. Setup for GZLM models in R
3.5.3. Binary logistic regression example
3.5.4. Gamma regression model
3.5.5. Poisson regression model
3.5.6. Negative binomial regression
3.6. Multilevel modeling (MLM)
3.6.1. Introduction
3.6.2. Setup and data
3.6.3. The random coefficients model
3.6.4. Likelihood ratio test
3.7. Panel data regression (PDR)
3.7.1. Introduction
3.7.2. Types of PDR model
3.7.3. The Hausman test
3.7.4. Setup and data
3.7.5. PDR with the plm package
3.7.6. PDR with the panelr package
3.8. Structural equation modeling (SEM)
3.9. Missing data analysis and data imputation
3.10. Chapter 3 command summary
Endnotes
4. Classification and regression trees in R
PART I: OVERVIEW OF CLASSIFICATION AND REGRESSION TREES WITH R
4.1. Introduction
4.2. Advantages of decision tree analysis
4.3. Limitations of decision tree analysis
4.4. Decision tree terminology
4.5. Steps in decision tree analysis
4.6. Decision tree algorithms
4.7. Random forests and ensemble methods
4.8. Software
4.8.1. R language
4.8.2. Stata
4.8.3. SAS
4.8.4. SPSS
4.8.5. Python language
4.9. Data and packages used in this chapter
4.9.1. Example data
4.9.2. R packages used
PART II: QUICK START - CLASSIFICATION AND REGRESSION TREES
4.10. Classification tree example: Survival on the Titanic
4.11. Regression tree example: Correlates of murder
PART III: CLASSIFICATION AND REGRESSION TREES, IN DETAIL
4.12. Overview
4.13. The rpart() program
4.13.1. Introduction
4.13.2. Training and validation datasets
4.13.3. Setup for rpart() trees
4.14. Classification trees with the rpart package
4.14.1. The basic rpart classification tree
4.14.2. Printing tree rules
4.14.3. Visualization with prp() and draw.tree()
4.14.4. Visualization with fancyRpartPlot()
4.14.5. Interpreting tree summaries
4.14.6. Listing nodes by country and countries by node
4.14.7. Node distribution plots
4.14.8. Saving predictions and residuals
4.14.9. Cross-validation and pruning
4.14.10. The confusion matrix and model performance metrics
4.14.11. The ROC curve and AUC
4.14.12. Lift plots
4.14.13. Gains plots
4.14.14. Precision vs. recall plot
4.15. Regression trees with the rpart package
4.15.1. Setup
4.15.2. Creating an rpart regression tree
4.15.3. Printing tree rules
4.15.4. Visualization with prp() and fancyRpartPlot()
4.15.5. Interpreting tree summaries
4.15.6. The CP table
4.15.7. Listing nodes by country and countries by node
4.15.8. Saving predictions and residuals
4.15.9. Plotting residuals
4.15.10. Cross-validation and pruning
4.15.11. R-squared for regression trees
4.15.12. MSE for regression trees
4.15.13. The confusion matrix
4.15.14. The ROC curve and AUC
4.15.15. Gains plots
4.15.16. Gains plot with OLS comparison
4.16. The tree package
4.17. The ctree() program for conditional decision trees
4.18. More decision trees programs for R
4.19. Chapter 4 command summary
Endnotes
5. Random forests
PART I: OVERVIEW OF RANDOM FORESTS IN R
5.1. Introduction
5.1.1. Social science examples of random forest models
5.1.2. Advantages of random forests
5.1.3. Limitations of random forests
5.1.4. Data and packages
PART II: QUICK START – RANDOM FORESTS
5.2. Classification forest example: Searching for the causes of happiness
5.3. Regression forest example: Why so much crime in my town?
PART III: RANDOM FORESTS, IN DETAIL
5.4. Classification forests with randomForest()
5.4.1. Setup
5.4.2. A basic classification model
5.4.3. Output components of randomForest() objects for classification models
5.4.4. Graphing a randomForest tree?
5.4.5. Comparing randomForest() and rpart() performance
5.4.6. Tuning the random forest model
5.4.7. MDS cluster analysis of the RF classification model
5.5. Regression forests with randomForest()
5.5.1. Introduction
5.5.2. Setup
5.5.3. A basic regression model
5.5.4. Output components for regression forest models
5.5.5. Graphing a randomForest tree?
5.5.6. MDS plots
5.5.7. Quartile plots
5.5.8. Comparing randomForest() and rpart() regression models
5.5.9. Tuning the randomForest() regression model
5.5.10. Outliers: Identifying and removing
5.6. The randomForestExplainer package
5.6.1. Setup for the randomForestExplainer package
5.6.2. Minimal depth plots
5.6.3. Multiway variable importance plots
5.6.4. Multiway ranking of variable importance
5.6.5. Comparing randomForest and OLS rankings of predictors
5.6.6. Which importance criteria?
5.6.7. Interaction analysis
5.6.8. The explain _ forest() function
5.7. Summary
5.8. Conditional inference forests
5.9. MDS plots for random forests
5.10. More random forest programs for R
5.11. Command summary
Endnotes
6. Modeling and machine learning
PART I: OVERVIEW OF MODELING AND MACHINE LEARNING
6.1. Introduction
6.1.1. Social science examples of modeling and machine learning in R
6.1.2. Advantages of modeling and machine learning in R
6.1.3. Limitations of modeling and machine learning in R
6.1.4. Data, packages, and default directory
PART II: QUICK START – MODELING AND MACHINE LEARNING
6.2. Example 1: Bayesian modeling of county-level poverty
6.2.1. Introduction
6.2.2. Setup
6.2.3. Correlation plot
6.2.4. The Bayes generalized linear model
6.3. Example 2: Predicting diabetes among Pima Indians with mlr3
6.3.1. Introduction
6.3.2. Setup
6.3.3. How mlr3 works
6.3.4. The Pima Indian data
PART III: MODELING AND MACHINE LEARNING IN DETAIL
6.4. Illustrating modeling and machine learning with SVM in caret
6.4.1. How SVM works
6.4.2. SVM algorithms compared to logistic and OLS regression
6.4.3. SVM kernels, types, and parameters
6.4.4. Tuning SVM models
6.4.5. SVM and longitudinal data
6.5. SVM versus OLS regression
6.6. SVM with the caret package: Predicting world literacy rates
6.6.1. Setup
6.6.2. Constructing the SVM regression model with caret
6.6.3. Obtaining predicted values and residuals
6.6.4. Model performance metrics
6.6.5. Variable importance
6.6.6. Other output elements
6.6.7. SVM plots
6.7. Tuning SVM models
6.7.1. Tuning for the train() command from the caret package
6.7.2. Tuning for the svm() command from the e1071 package
6.7.3. Cross-validating SVM models
6.7.4. Using e1071 in caret rather than the default kern package
6.8. SVM classification models: Classifying U.S. Senators
6.8.1. The “senate” example and setup
6.8.2. SVM classification with alternative kernels: Senate example
6.8.3. Tuning the SVM binary classification model
6.9. Gradient boosting machines (GBM)
6.9.1. Introduction
6.9.2. Setup and example data
6.9.3. Metrics for comparing models
6.9.4. The caret control object
6.9.5. Training the GBM model under caret
6.10. Learning vector quantization (LVQ)
6.10.1. Introduction
6.10.2. Setup and example data
6.10.3. Metrics for comparing models
6.10.4. The caret control object
6.10.5. Training the LVQ model under caret
6.11. Comparing models
6.12. Variable importance
6.12.1. Leave-one-out modeling
6.12.2. Recursive feature elimination (RFE) with caret
6.12.3. Other approaches to variable importance
6.13. SVM classification for a multinomial outcome
6.14. Command summary
Endnotes
7. Neural network models and deep learning
PART I: OVERVIEW OF NEURAL NETWORK MODELS AND DEEP LEARNING
7.1. Overview
7.2. Data and packages
7.3. Social science examples
7.4. Pros and cons of neural networks
7.5. Artificial neural network (ANN) concepts
7.5.1. ANN terms
7.5.2. R software programs for ANN
7.5.3. Training methods for ANN
7.5.4. Algorithms in neuralnet
7.5.5. Algorithms in nnet
7.5.6. Tuning ANN models
PART II: QUICK START - MODELING AND MACHINE LEARNING
7.6. Example 1: Analyzing NYC airline delays
7.6.1. Introduction
7.6.2. General setup
7.6.3. Data preparation
7.6.4. Modeling NYC airline delays
7.7. Example 2: The classic iris classification example
7.7.1. Setup
7.7.2. Exploring separation with a violin plot
7.7.3. Normalizing the data
7.7.4. Training the model with nnet in caret
7.7.5. Obtain model predictions
7.7.6. Display the neural model
PART III: NEURAL NETWORK MODELS IN DETAIL
7.8. Analyzing Boston crime via the neuralnet package
7.8.1. Setup
7.8.2. The linear regression model for unscaled data
7.8.3. The neuralnet model for unscaled data
7.8.4. Scaling the data
7.8.5. The linear regression model for scaled data
7.8.6. The neuralnet model for scaled data
7.8.7. Neuralnet results for the training data
7.8.8. Model performance plots
7.8.9. Visualizing the neuralnet model
7.8.10. Variable importance for the neuralnet model
7.9. Analyzing Boston crime via neuralnet under the caret package
7.10. Analyzing Boston crime via nnet in caret
7.10.1. Setup
7.10.2. The nnet/caret model of Boston crime
7.10.3. Variable importance for the nnet/caret model
7.10.4. Further tuning the nnet model outside caret
7.11. A classification model of marital status using nnet
7.11.1. Setup
7.11.2. The nnet classification model of marital status
7.12. Neural network analysis using “mlr3keras”
7.13. Command summary
Endnotes
8. Network analysis
PART I: OVERVIEW OF NETWORK ANALYSIS WITH R
8.1. Introduction
8.2. Data and packages used in this chapter
8.3. Concepts in network analysis
8.4. Getting data into network format
PART II: QUICK START ON NETWORK ANALYSIS WITH R
8.5. Quick start exercise 1: The Medici family network
8.6. Quick start exercise 2: Marvel hero network communities
PART III: NETWORK ANALYSIS WITH R IN DETAIL
8.7. Interactive network analysis with visNetwork
8.7.1. Undirected networks: Research team management
8.7.2. Clustering by group: Research team grouped by gender
8.7.3. A larger network with navigation and circle layout
8.7.4. Visualizing classification and regression trees: National literacy
8.7.5. A directed network (asymmetrical relationships in a research team)
8.8. Network analysis with igraph
8.8.1. Term adjacency networks: Gubernatorial websites and the covid pandemic
8.8.2. Similarity/distance networks with igraph: Senate interest group ratings
8.8.3. Communities, modularity, and centrality
8.8.4. Similarity network analysis: All senators
8.9. Using intergraph for network conversions
8.10. Network-on-a-map with the diagram and maps packages
8.11. Network analysis with the statnet and network packages
8.11.1. Introduction
8.11.2. Visualization
8.11.3. Neighborhoods
8.11.4. Cluster analysis
8.12. Clique analysis with sna
8.12.1. A simplified clique analysis
8.12.2. A clique analysis of the DHHS formal network
8.12.3. K-core analysis of the DHHS formal network
8.13. Mapping international trade flow with statnet and Intergraph
8.14. Correlation networks with corrr
8.15. Network analysis with tidygraph
8.15.1. Introduction
8.15.2. A simple tidygraph example
8.15.3. Network conversions with tidygraph
8.15.4. Finding community clusters with tidygraph
8.16. Simulating networks
8.16.1. Agent-based network modeling with SchellingR
8.16.2. Agent-based network modeling with RSiena
8.16.3. Agent-based network modeling with NetLogoR
8.17. Summary
8.18. Command summary
Endnotes
9. Text analytics
PART I: OVERVIEW OF TEXT ANALYTICS WITH R
9.1. Overview
9.2. Data used in this chapter
9.3. Packages used in this chapter
9.4. What is a corpus?
9.5. Text files
9.5.1. Overview
9.5.2. Archived texts
9.5.3. Project Gutenberg archive
9.5.4. Comma-separated values (.csv) files
9.5.5. Text from Word .docx files with the textreadr package
9.5.6. Text from other formats with the readtext package
9.5.7. Text from raw text files
PART II: QUICK START ON TEXT ANALYTICS WITH R
9.6. Quick start exercise 1: Key word in context (kwic) indexing
9.7. Quick start exercise 2: Word frequencies and histograms
PART III: NETWORK ANALYSIS WITH R IN DETAIL
9.8. Web scraping
9.8.1. Overview
9.8.2. Web scraping: The “htm2txt” package
9.8.3. Web scraping: The “rvest” package
9.9. Social media scraping
9.9.1. Analysis of Twitter data: Trump and the New York Times
9.9.2. Social media scraping with twitter
9.10. Leading text formats in R
9.10.1. Overview
9.10.2. Formats related to the “tidytext” package
9.10.3. Formats related to the “tm” package
9.10.4. Formats related to the “quanteda” package
9.10.5. Common text file conversions
9.11. Tokenization
9.11.1. Overview
9.11.2. Word tokenization
9.12. Character encoding
9.13. Text cleaning and preparation
9.14. Analysis: Multigroup word frequency comparisons
9.14.1. Multigroup analysis in tidytext
9.14.2. Multigroup analysis with quanteda’s textstat _ keyness() command
9.14.3. Multigroup analysis with textstat _ frequency() in quanteda and ggplot2
9.15. Analysis: Word clouds
9.16. Analysis: Comparison clouds
9.17. Analysis: Word maps and word correlations
9.17.1. Working with the tdm format
9.17.2. Working with the dtm format
9.17.3. Word frequencies and word correlations
9.17.4. Correlation plots of word and document associations
9.17.5. Plotting word stem correlations for word pairs
9.17.6. Word correlation maps
9.18. Analysis: Sentiment analysis
9.18.1. Overview
9.18.2. Example: sentiment analysis of news articles
9.19. Analysis: Topic modeling
9.19.1. Overview
9.19.2. Topic analysis example 1: Modeling topic frequency over time
9.19.3. Topic analysis example 2: LDA analysis
9.20. Analysis: Lexical dispersion plots
9.21. Analysis: Bigrams and ngrams
9.22. Command Summary
Endnotes
Appendix 1: Introduction to R and R studio
Appendix 2: Data used in this book
References
Index