توضیحاتی در مورد کتاب :
درک خود را از علم داده و تجزیه و تحلیل داده ها از منظر آماری تقویت کنید تا با استفاده از برنامه نویسی پایتون بینش های معناداری را از داده های خود استخراج کنید. با استفاده از مثالهای مبتنی بر پایتون، یک پایه محکم در آمار برای علم داده و یادگیری ماشین ایجاد کنید. این کتاب راهنمای دقیقی است که ریاضیات و روش های آماری مختلف مورد نیاز برای انجام وظایف علم داده را پوشش می دهد. این کتاب با نشان دادن نحوه پیش پردازش داده ها و بازرسی توزیع ها و همبستگی ها از دیدگاه آماری شروع می شود. سپس با اصول تحلیل آماری آشنا می شوید و مفاهیم آن را در مجموعه داده های دنیای واقعی به کار می برید. همانطور که پیش می روید، متوجه خواهید شد که چگونه مفاهیم آماری از مراحل مختلف خطوط لوله علم داده پدیدار می شوند، خلاصه مجموعه داده ها را به زبان آمار درک می کنید و از آن برای ایجاد پایه ای محکم برای محصولات داده ای قوی مانند مدل های توضیحی و پیش بینی استفاده می کنید. مدل ها. هنگامی که مکانیسم کار الگوریتم های علم داده را کشف کردید، مفاهیم اساسی برای جمع آوری کارآمد داده ها، تمیز کردن، استخراج، تجسم و تجزیه و تحلیل را پوشش خواهید داد. در نهایت، شما روشهای آماری را در وظایف کلیدی یادگیری ماشین مانند طبقهبندی، رگرسیون، روشهای مبتنی بر درخت و یادگیری گروهی پیادهسازی خواهید کرد. در پایان این کتاب آمار ضروری برای تحلیلگران داده غیرSTEM، شما یاد خواهید گرفت که چگونه یک محصول داده ای مستقل و مبتنی بر آمار برای دستیابی به اهداف تجاری خود بسازید و ارائه دهید. آنچه یاد خواهید گرفت بیاموزید چگونه داده ها را در یک محیط تجزیه و تحلیل جمع آوری و بارگذاری کنید. تجزیه و تحلیل توصیفی برای استخراج خلاصه های معنادار از داده ها انجام دهید. آزمونهای آماری با تحلیل واریانس، تجزیه و تحلیل سریهای زمانی و نمونههای آزمون A/B درک آمار پشت الگوریتمهای رایج یادگیری ماشین پاسخ دادن به سؤالات آماری برای مصاحبههای دانشمندان داده این کتاب برای چه کسی است این کتاب راهنمای سطح ابتدایی برای علاقهمندان به علم داده است. تحلیلگران داده و هر کسی که در زمینه علم داده شروع می کند و به دنبال یادگیری مفاهیم اساسی آماری با کمک توضیحات و مثال های ساده است. اگر یک برنامه نویس یا دانش آموز با پیشینه غیر ریاضی هستید، این کتاب برای شما مفید خواهد بود. دانش کاری زبان برنامه نویسی پایتون الزامی است.
فهرست مطالب :
Cover
Title Page
Copyright and Credits
About Packt
Contributors
Table of Contents
Preface
Section 1: Getting Started with Statistics for Data Science
Chapter 1: Fundamentals of Data Collection, Cleaning, and Preprocessing
Technical requirements
Collecting data from various data sources
Reading data directly from files
Obtaining data from an API
Obtaining data from scratch
Data imputation
Preparing the dataset for imputation
Imputation with mean or median values
Imputation with the mode/most frequent value
Outlier removal
Data standardization – when and how
Examples involving the scikit-learn preprocessing module
Imputation
Standardization
Summary
Chapter 2: Essential Statistics for Data Assessment
Classifying numerical and categorical variables
Distinguishing between numerical and categorical variables
Understanding mean, median, and mode
Mean
Median
Mode
Learning about variance, standard deviation, quartiles,percentiles, and skewness
Variance
Standard deviation
Quartiles
Skewness
Knowing how to handle categorical variables and mixed data types
Frequencies and proportions
Transforming a continuous variable to a categorical one
Using bivariate and multivariate descriptive statistics
Covariance
Cross-tabulation
Summary
Chapter 3: Visualization with Statistical Graphs
Basic examples with the Python Matplotlib package
Elements of a statistical graph
Exploring important types of plotting in Matplotlib
Advanced visualization customization
Customizing the geometry
Customizing the aesthetics
Query-oriented statistical plotting
Example 1 – preparing data to fit the plotting function API
Example 2 – combining analysis with plain plotting
Presentation-ready plotting tips
Use styling
Font matters a lot
Summary
Section 2: Essentials of Statistical Analysis
Chapter 4: Sampling and Inferential Statistics
Understanding fundamental concepts in sampling techniques
Performing proper sampling under different scenarios
The dangers associated with non-probability sampling
Probability sampling – the safer approach
Understanding statistics associated with sampling
Sampling distribution of the sample mean
Standard error of the sample mean
The central limit theorem
Summary
Chapter 5: Common Probability Distributions
Understanding important concepts in probability
Events and sample space
The probability mass function and the probability density function
Subjective probability and empirical probability
Understanding common discrete probability distributions
Bernoulli distribution
Binomial distribution
Poisson distribution
Understanding the common continuous probability distribution
Uniform distribution
Exponential distribution
Normal distribution
Learning about joint and conditional distribution
Independency and conditional distribution
Understanding the power law and black swan
The ubiquitous power law
Be aware of the black swan
Summary
Chapter 6: Parametric Estimation
Understanding the concepts of parameter estimation and the features of estimators
Evaluation of estimators
Using the method of moments to estimate parameters
Example 1 – the number of 911 phone calls in a day
Example 2 – the bounds of uniform distribution
Applying the maximum likelihood approach with Python
Likelihood function
MLE for uniform distribution boundaries
MLE for modeling noise
MLE and the Bayesian theorem
Summary
Chapter 7: Statistical Hypothesis Testing
An overview of hypothesis testing
Understanding P-values, test statistics, and significance levels
Making sense of confidence intervals and P-values from visual examples
Calculating the P-value from discrete events
Calculating the P-value from the continuous PDF
Significance levels in t-distribution
The power of a hypothesis test
Using SciPy for common hypothesis testing
The paradigm
T-test
The normality hypothesis test
The goodness-of-fit test
A simple ANOVA model
Stationarity tests for time series
Examples of stationary and non-stationary time series
Appreciating A/B testing with a real-world example
Conducting an A/B test
Randomization and blocking
Common test statistics
Common mistakes in A/B tests
Summary
Section 3: Statistics for Machine Learning
Chapter 8: Statistics for Regression
Understanding a simple linear regression model and its rich content
Least squared error linear regression and variance decomposition
The coefficient of determination
Hypothesis testing
Connecting the relationship between regression and estimators
Simple linear regression as an estimator
Having hands-on experience with multivariate linear regression and collinearity analysis
Collinearity
Learning regularization from logistic regression examples
Summary
Chapter 9: Statistics for Classification
Understanding how a logistic regression classifier works
The formulation of a classification problem
Implementing logistic regression from scratch
Evaluating the performance of the logistic regression classifier
Building a naïve Bayes classifier from scratch
Underfitting, overfitting, and cross-validation
Summary
Chapter 10: Statistics for Tree-Based Methods
Overviewing tree-based methods for classification tasks
Growing and pruning a classification tree
Understanding how splitting works
Evaluating decision tree performance
Exploring regression tree
Using tree models in scikit-learn
Summary
Chapter 11: Statistics for Ensemble Methods
Revisiting bias, variance, and memorization
Understanding the bootstrapping and bagging techniques
Understanding and using the boosting module
Exploring random forests with scikit-learn
Summary
Section 4: Appendix
Chapter 12: A Collection of Best Practices
Understanding the importance of data quality
Understanding why data can be problematic
Avoiding the use of misleading graphs
Example 1 – COVID-19 trend
Example 2 – Bar plot cropping
Fighting against false arguments
Summary
Chapter 13: Exercises and Projects
Exercises
Chapter 1 – Fundamentals of Data Collection, Cleaning, and Preprocessing
Chapter 2 – Essential Statistics for Data Assessment
Chapter 3 – Visualization with Statistical Graphs
Chapter 4 – Sampling and Inferential Statistics
Chapter 5 – Common Probability Distributions
Chapter 6 – Parameter Estimation
Chapter 7 – Statistical Hypothesis Testing
Chapter 8 – Statistics for Regression
Chapter 9 – Statistics for Classification
Chapter 10 – Statistics for Tree-Based Methods
Chapter 11 – Statistics for Ensemble Methods
Project suggestions
Non-tabular data
Real-time weather data
Goodness of fit for discrete distributions
Building a weather prediction web app
Building a typing suggestion app
Further reading
Textbooks
Visualization
Exercising your mind
Summary
Other Books You May Enjoy
Index
توضیحاتی در مورد کتاب به زبان اصلی :
Reinforce your understanding of data science and data analysis from a statistical perspective to extract meaningful insights from your data using Python programming Key features Work your way through the entire data analysis pipeline with statistics concerns in mind to make reasonable decisions Understand how various data science algorithms function Build a solid foundation in statistics for data science and machine learning using Python-based examples Book Description Statistics remain the backbone of modern analysis tasks, helping you to interpret the results produced by data science pipelines. This book is a detailed guide covering the math and various statistical methods required for undertaking data science tasks. The book starts by showing you how to preprocess data and inspect distributions and correlations from a statistical perspective. You'll then get to grips with the fundamentals of statistical analysis and apply its concepts to real-world datasets. As you advance, you'll find out how statistical concepts emerge from different stages of data science pipelines, understand the summary of datasets in the language of statistics, and use it to build a solid foundation for robust data products such as explanatory models and predictive models. Once you've uncovered the working mechanism of data science algorithms, you'll cover essential concepts for efficient data collection, cleaning, mining, visualization, and analysis. Finally, you'll implement statistical methods in key machine learning tasks such as classification, regression, tree-based methods, and ensemble learning. By the end of this Essential Statistics for Non-STEM Data Analysts book, you'll have learned how to build and present a self-contained, statistics-backed data product to meet your business goals. What you will learn Find out how to grab and load data into an analysis environment Perform descriptive analysis to extract meaningful summaries from data Discover probability, parameter estimation, hypothesis tests, and experiment design best practices Get to grips with resampling and bootstrapping in Python Delve into statistical tests with variance analysis, time series analysis, and A/B test examples Understand the statistics behind popular machine learning algorithms Answer questions on statistics for data scientist interviews Who this book is for This book is an entry-level guide for data science enthusiasts, data analysts, and anyone starting out in the field of data science and looking to learn the essential statistical concepts with the help of simple explanations and examples. If you're a developer or student with a non-mathematical background, you'll find this book useful. Working knowledge of the Python programming language is required.