The Data Science Workshop

دانلود کتاب The Data Science Workshop

43000 تومان موجود

کتاب کارگاه علم داده نسخه زبان اصلی

دانلود کتاب کارگاه علم داده بعد از پرداخت مقدور خواهد بود
توضیحات کتاب در بخش جزئیات آمده است و می توانید موارد را مشاهده فرمایید


این کتاب نسخه اصلی می باشد و به زبان فارسی نیست.


امتیاز شما به این کتاب (حداقل 1 و حداکثر 5):

امتیاز کاربران به این کتاب:        تعداد رای دهنده ها: 8


توضیحاتی در مورد کتاب The Data Science Workshop

نام کتاب : The Data Science Workshop
ویرایش : 2
عنوان ترجمه شده به فارسی : کارگاه علم داده
سری :
نویسندگان : , , , , ,
ناشر :
سال نشر : 2020
تعداد صفحات : 823
ISBN (شابک) : 9781800566927
زبان کتاب : English
فرمت کتاب : pdf
حجم کتاب : 24 مگابایت



بعد از تکمیل فرایند پرداخت لینک دانلود کتاب ارائه خواهد شد. درصورت ثبت نام و ورود به حساب کاربری خود قادر خواهید بود لیست کتاب های خریداری شده را مشاهده فرمایید.


فهرست مطالب :


Cover FM Copyright Table of Contents Preface Chapter 1: Introduction to Data Science in Python Introduction Application of Data Science What Is Machine Learning? Supervised Learning Unsupervised Learning Reinforcement Learning Overview of Python Types of Variable Numeric Variables Text Variables Python List Python Dictionary Exercise 1.01: Creating a Dictionary That Will Contain Machine Learning Algorithms Python for Data Science The pandas Package DataFrame and Series CSV Files Excel Spreadsheets JSON Exercise 1.02: Loading Data of Different Formats into a pandas DataFrame Scikit-Learn What Is a Model? Model Hyperparameters The sklearn API Exercise 1.03: Predicting Breast Cancer from a Dataset Using sklearn Activity 1.01: Train a Spam Detector Algorithm Summary Chapter 2: Regression Introduction Simple Linear Regression The Method of Least Squares Multiple Linear Regression Estimating the Regression Coefficients (β0, β1, β2 and β3) Logarithmic Transformations of Variables Correlation Matrices Conducting Regression Analysis Using Python Exercise 2.01: Loading and Preparing the Data for Analysis The Correlation Coefficient Exercise 2.02: Graphical Investigation of Linear Relationships Using Python Exercise 2.03: Examining a Possible Log-Linear Relationship Using Python The Statsmodels formula API Exercise 2.04: Fitting a Simple Linear Regression Model Using the Statsmodels formula API Analyzing the Model Summary The Model Formula Language Intercept Handling Activity 2.01: Fitting a Log-Linear Model Using the Statsmodels Formula API Multiple Regression Analysis Exercise 2.05: Fitting a Multiple Linear Regression Model Using the Statsmodels Formula API Assumptions of Regression Analysis Activity 2.02: Fitting a Multiple Log-Linear Regression Model Explaining the Results of Regression Analysis Regression Analysis Checks and Balances The F-test The t-test Summary Chapter 3: Binary Classification Introduction Understanding the Business Context Business Discovery Exercise 3.01: Loading and Exploring the Data from the Dataset Testing Business Hypotheses Using Exploratory Data Analysis Visualization for Exploratory Data Analysis Exercise 3.02: Business Hypothesis Testing for Age versus Propensity for a Term Loan Intuitions from the Exploratory Analysis Activity 3.01: Business Hypothesis Testing to Find Employment Status versus Propensity for Term Deposits Feature Engineering Business-Driven Feature Engineering Exercise 3.03: Feature Engineering – Exploration of Individual Features Exercise 3.04: Feature Engineering – Creating New Features from Existing Ones Data-Driven Feature Engineering A Quick Peek at Data Types and a Descriptive Summary Correlation Matrix and Visualization Exercise 3.05: Finding the Correlation in Data to Generate a Correlation Plot Using Bank Data Skewness of Data Histograms Density Plots Other Feature Engineering Methods Summarizing Feature Engineering Building a Binary Classification Model Using the Logistic Regression Function Logistic Regression Demystified Metrics for Evaluating Model Performance Confusion Matrix Accuracy Classification Report Data Preprocessing Exercise 3.06: A Logistic Regression Model for Predicting the Propensity of Term Deposit Purchases in a Bank Activity 3.02: Model Iteration 2 – Logistic Regression Model with Feature Engineered Variables Next Steps Summary Chapter 4: Multiclass Classification with RandomForest Introduction Training a Random Forest Classifier Evaluating the Model's Performance Exercise 4.01: Building a Model for Classifying Animal Type and Assessing Its Performance Number of Trees Estimator Exercise 4.02: Tuning n_estimators to Reduce Overfitting Maximum Depth Exercise 4.03: Tuning max_depth to Reduce Overfitting Minimum Sample in Leaf Exercise 4.04: Tuning min_samples_leaf Maximum Features Exercise 4.05: Tuning max_features Activity 4.01: Train a Random Forest Classifier on the ISOLET Dataset Summary Chapter 5: Performing Your First Cluster Analysis Introduction Clustering with k-means Exercise 5.01: Performing Your First Clustering Analysis on the ATO Dataset Interpreting k-means Results Exercise 5.02: Clustering Australian Postcodes by Business Income and Expenses Choosing the Number of Clusters Exercise 5.03: Finding the Optimal Number of Clusters Initializing Clusters Exercise 5.04: Using Different Initialization Parameters to Achieve a Suitable Outcome Calculating the Distance to the Centroid Exercise 5.05: Finding the Closest Centroids in Our Dataset Standardizing Data Exercise 5.06: Standardizing the Data from Our Dataset Activity 5.01: Perform Customer Segmentation Analysis in a Bank Using k-means Summary Chapter 6: How to Assess Performance Introduction Splitting Data Exercise 6.01: Importing and Splitting Data Assessing Model Performance for Regression Models Data Structures – Vectors and Matrices Scalars Vectors Matrices R2 Score Exercise 6.02: Computing the R2 Score of a Linear Regression Model Mean Absolute Error Exercise 6.03: Computing the MAE of a Model Exercise 6.04: Computing the Mean Absolute Error of a Second Model Other Evaluation Metrics Assessing Model Performance for Classification Models Exercise 6.05: Creating a Classification Model for Computing Evaluation Metrics The Confusion Matrix Exercise 6.06: Generating a Confusion Matrix for the Classification Model More on the Confusion Matrix Precision Exercise 6.07: Computing Precision for the Classification Model Recall Exercise 6.08: Computing Recall for the Classification Model F1 Score Exercise 6.09: Computing the F1 Score for the Classification Model Accuracy Exercise 6.10: Computing Model Accuracy for the Classification Model Logarithmic Loss Exercise 6.11: Computing the Log Loss for the Classification Model Receiver Operating Characteristic Curve Exercise 6.12: Computing and Plotting ROC Curve for a Binary Classification Problem Area Under the ROC Curve Exercise 6.13: Computing the ROC AUC for the Caesarian Dataset Saving and Loading Models Exercise 6.14: Saving and Loading a Model Activity 6.01: Train Three Different Models and Use Evaluation Metrics to Pick the Best Performing Model Summary Chapter 7: The Generalization of Machine Learning Models Introduction Overfitting Training on Too Many Features Training for Too Long Underfitting Data The Ratio for Dataset Splits Creating Dataset Splits Exercise 7.01: Importing and Splitting Data Random State Exercise 7.02: Setting a Random State When Splitting Data Cross-Validation KFold Exercise 7.03: Creating a Five-Fold Cross-Validation Dataset Exercise 7.04: Creating a Five-Fold Cross-Validation Dataset Using a Loop for Calls cross_val_score Exercise 7.05: Getting the Scores from Five-Fold Cross-Validation Understanding Estimators That Implement CV LogisticRegressionCV Exercise 7.06: Training a Logistic Regression Model Using Cross-Validation Hyperparameter Tuning with GridSearchCV Decision Trees Exercise 7.07: Using Grid Search with Cross-Validation to Find the Best Parameters for a Model Hyperparameter Tuning with RandomizedSearchCV Exercise 7.08: Using Randomized Search for Hyperparameter Tuning Model Regularization with Lasso Regression Exercise 7.09: Fixing Model Overfitting Using Lasso Regression Ridge Regression Exercise 7.10: Fixing Model Overfitting Using Ridge Regression Activity 7.01: Find an Optimal Model for Predicting the Critical Temperatures of Superconductors Summary Chapter 8: Hyperparameter Tuning Introduction What Are Hyperparameters? Difference between Hyperparameters and Statistical Model Parameters Setting Hyperparameters A Note on Defaults Finding the Best Hyperparameterization Exercise 8.01: Manual Hyperparameter Tuning for a k-NN Classifier Advantages and Disadvantages of a Manual Search Tuning Using Grid Search Simple Demonstration of the Grid Search Strategy GridSearchCV Tuning using GridSearchCV Support Vector Machine (SVM) Classifiers Exercise 8.02: Grid Search Hyperparameter Tuning for an SVM Advantages and Disadvantages of Grid Search Random Search Random Variables and Their Distributions Simple Demonstration of the Random Search Process Tuning Using RandomizedSearchCV Exercise 8.03: Random Search Hyperparameter Tuning for a Random Forest Classifier Advantages and Disadvantages of a Random Search Activity 8.01: Is the Mushroom Poisonous? Summary Chapter 9: Interpreting a Machine Learning Model Introduction Linear Model Coefficients Exercise 9.01: Extracting the Linear Regression Coefficient RandomForest Variable Importance Exercise 9.02: Extracting RandomForest Feature Importance Variable Importance via Permutation Exercise 9.03: Extracting Feature Importance via Permutation Partial Dependence Plots Exercise 9.04: Plotting Partial Dependence Local Interpretation with LIME Exercise 9.05: Local Interpretation with LIME Activity 9.01: Train and Analyze a Network Intrusion Detection Model Summary Chapter 10: Analyzing a Dataset Introduction Exploring Your Data Analyzing Your Dataset Exercise 10.01: Exploring the Ames Housing Dataset with Descriptive Statistics Analyzing the Content of a Categorical Variable Exercise 10.02: Analyzing the Categorical Variables from the Ames Housing Dataset Summarizing Numerical Variables Exercise 10.03: Analyzing Numerical Variables from the Ames Housing Dataset Visualizing Your Data Using the Altair API Histogram for Numerical Variables Bar Chart for Categorical Variables Boxplots Exercise 10.04: Visualizing the Ames Housing Dataset with Altair Activity 10.01: Analyzing Churn Data Using Visual Data Analysis Techniques Summary Chapter 11: Data Preparation Introduction Handling Row Duplication Exercise 11.01: Handling Duplicates in a Breast Cancer Dataset Converting Data Types Exercise 11.02: Converting Data Types for the Ames Housing Dataset Handling Incorrect Values Exercise 11.03: Fixing Incorrect Values in the State Column Handling Missing Values Exercise 11.04: Fixing Missing Values for the Horse Colic Dataset Activity 11.01: Preparing the Speed Dating Dataset Summary Chapter 12: Feature Engineering Introduction Merging Datasets The Left Join The Right Join Exercise 12.01: Merging the ATO Dataset with the Postcode Data Binning Variables Exercise 12.02: Binning the YearBuilt Variable from the AMES Housing Dataset Manipulating Dates Exercise 12.03: Date Manipulation on Financial Services Consumer Complaints Performing Data Aggregation Exercise 12.04: Feature Engineering Using Data Aggregation on the AMES Housing Dataset Activity 12.01: Feature Engineering on a Financial Dataset Summary Chapter 13: Imbalanced Datasets Introduction Understanding the Business Context Exercise 13.01: Benchmarking the Logistic Regression Model on the Dataset Analysis of the Result Challenges of Imbalanced Datasets Strategies for Dealing with Imbalanced Datasets Collecting More Data Resampling Data Exercise 13.02: Implementing Random Undersampling and Classification on Our Banking Dataset to Find the Optimal Result Analysis Generating Synthetic Samples Implementation of SMOTE and MSMOTE Exercise 13.03: Implementing SMOTE on Our Banking Dataset to Find the Optimal Result Exercise 13.04: Implementing MSMOTE on Our Banking Dataset to Find the Optimal Result Applying Balancing Techniques on a Telecom Dataset Activity 13.01: Finding the Best Balancing Technique by Fitting a Classifier on the Telecom Churn Dataset Summary Chapter 14: Dimensionality Reduction Introduction Business Context Exercise 14.01: Loading and Cleaning the Dataset Creating a High-Dimensional Dataset Activity 14.01: Fitting a Logistic Regression Model on a HighDimensional Dataset Strategies for Addressing High-Dimensional Datasets Backward Feature Elimination (Recursive Feature Elimination) Exercise 14.02: Dimensionality Reduction Using Backward Feature Elimination Forward Feature Selection Exercise 14.03: Dimensionality Reduction Using Forward Feature Selection Principal Component Analysis (PCA) Exercise 14.04: Dimensionality Reduction Using PCA Independent Component Analysis (ICA) Exercise 14.05: Dimensionality Reduction Using Independent Component Analysis Factor Analysis Exercise 14.06: Dimensionality Reduction Using Factor Analysis Comparing Different Dimensionality Reduction Techniques Activity 14.02: Comparison of Dimensionality Reduction Techniques on the Enhanced Ads Dataset Summary Chapter 15: Ensemble Learning Introduction Ensemble Learning Variance Bias Business Context Exercise 15.01: Loading, Exploring, and Cleaning the Data Activity 15.01: Fitting a Logistic Regression Model on Credit Card Data Simple Methods for Ensemble Learning Averaging Exercise 15.02: Ensemble Model Using the Averaging Technique Weighted Averaging Exercise 15.03: Ensemble Model Using the Weighted Averaging Technique Iteration 2 with Different Weights Max Voting Exercise 15.04: Ensemble Model Using Max Voting Advanced Techniques for Ensemble Learning Bagging Exercise 15.05: Ensemble Learning Using Bagging Boosting Exercise 15.06: Ensemble Learning Using Boosting Stacking Exercise 15.07: Ensemble Learning Using Stacking Activity 15.02: Comparison of Advanced Ensemble Techniques Summary Index




پست ها تصادفی