توضیحاتی در مورد کتاب Advanced Analytics with PySpark: Patterns for Learning from Data at Scale Using Python and Spark
نام کتاب : Advanced Analytics with PySpark: Patterns for Learning from Data at Scale Using Python and Spark
ویرایش : 1
عنوان ترجمه شده به فارسی : تجزیه و تحلیل پیشرفته با PySpark: الگوهایی برای یادگیری از داده ها در مقیاس با استفاده از Python و Spark
سری :
نویسندگان : Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills
ناشر : O'Reilly Media
سال نشر : 2022
تعداد صفحات : 236
ISBN (شابک) : 1098103653 , 9781098103651
زبان کتاب : English
فرمت کتاب : pdf
حجم کتاب : 10 مگابایت
بعد از تکمیل فرایند پرداخت لینک دانلود کتاب ارائه خواهد شد. درصورت ثبت نام و ورود به حساب کاربری خود قادر خواهید بود لیست کتاب های خریداری شده را مشاهده فرمایید.
فهرست مطالب :
Cover
Copyright
Table of Contents
Preface
Why Did We Write This Book Now?
How This Book Is Organized
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Chapter 1. Analyzing Big Data
Working with Big Data
Introducing Apache Spark and PySpark
Components
PySpark
Ecosystem
Spark 3.0
PySpark Addresses Challenges of Data Science
Where to Go from Here
Chapter 2. Introduction to Data Analysis with PySpark
Spark Architecture
Installing PySpark
Setting Up Our Data
Analyzing Data with the DataFrame API
Fast Summary Statistics for DataFrames
Pivoting and Reshaping DataFrames
Joining DataFrames and Selecting Features
Scoring and Model Evaluation
Where to Go from Here
Chapter 3. Recommending Music and the Audioscrobbler Dataset
Setting Up the Data
Our Requirements for a Recommender System
Alternating Least Squares Algorithm
Preparing the Data
Building a First Model
Spot Checking Recommendations
Evaluating Recommendation Quality
Computing AUC
Hyperparameter Selection
Making Recommendations
Where to Go from Here
Chapter 4. Making Predictions with Decision Trees and Decision Forests
Decision Trees and Forests
Preparing the Data
Our First Decision Tree
Decision Tree Hyperparameters
Tuning Decision Trees
Categorical Features Revisited
Random Forests
Making Predictions
Where to Go from Here
Chapter 5. Anomaly Detection with K-means Clustering
K-means Clustering
Identifying Anomalous Network Traffic
KDD Cup 1999 Dataset
A First Take on Clustering
Choosing k
Visualization with SparkR
Feature Normalization
Categorical Variables
Using Labels with Entropy
Clustering in Action
Where to Go from Here
Chapter 6. Understanding Wikipedia with LDA and Spark NLP
Latent Dirichlet Allocation
LDA in PySpark
Getting the Data
Spark NLP
Setting Up Your Environment
Parsing the Data
Preparing the Data Using Spark NLP
TF-IDF
Computing the TF-IDFs
Creating Our LDA Model
Where to Go from Here
Chapter 7. Geospatial and Temporal Data Analysis on Taxi Trip Data
Preparing the Data
Converting Datetime Strings to Timestamps
Handling Invalid Records
Geospatial Analysis
Intro to GeoJSON
GeoPandas
Sessionization in PySpark
Building Sessions: Secondary Sorts in PySpark
Where to Go from Here
Chapter 8. Estimating Financial Risk
Terminology
Methods for Calculating VaR
Variance-Covariance
Historical Simulation
Monte Carlo Simulation
Our Model
Getting the Data
Preparing the Data
Determining the Factor Weights
Sampling
The Multivariate Normal Distribution
Running the Trials
Visualizing the Distribution of Returns
Where to Go from Here
Chapter 9. Analyzing Genomics Data and the BDG Project
Decoupling Storage from Modeling
Setting Up ADAM
Introduction to Working with Genomics Data Using ADAM
File Format Conversion with the ADAM CLI
Ingesting Genomics Data Using PySpark and ADAM
Predicting Transcription Factor Binding Sites from ENCODE Data
Where to Go from Here
Chapter 10. Image Similarity Detection with Deep Learning and PySpark LSH
PyTorch
Installation
Preparing the Data
Resizing Images Using PyTorch
Deep Learning Model for Vector Representation of Images
Image Embeddings
Import Image Embeddings into PySpark
Image Similarity Search Using PySpark LSH
Nearest Neighbor Search
Where to Go from Here
Chapter 11. Managing the Machine Learning Lifecycle with MLflow
Machine Learning Lifecycle
MLflow
Experiment Tracking
Managing and Serving ML Models
Creating and Using MLflow Projects
Where to Go from Here
Index
About the Authors
Colophon