دانلود کتاب Mining of Massive Datasets [Team-IRA]

Name: کتاب Mining of Massive Datasets [Team-IRA]
Rating: 4.86 (7 reviews)
ISBN: 1108476341

45000 تومان موجود

کتاب استخراج مجموعه داده های عظیم [Team-IRA] نسخه زبان اصلی

دانلود کتاب استخراج مجموعه داده های عظیم [Team-IRA] بعد از پرداخت مقدور خواهد بود
توضیحات کتاب در بخش جزئیات آمده است و می توانید موارد را مشاهده فرمایید

در صورت ایرانی بودن نویسنده امکان دانلود وجود ندارد و مبلغ عودت داده خواهد شد

امتیاز شما به این کتاب (حداقل 1 و حداکثر 5):

امتیاز کاربران به این کتاب: تعداد رای دهنده ها: 7

توضیحاتی در مورد کتاب Mining of Massive Datasets [Team-IRA]

نام کتاب : Mining of Massive Datasets [Team-IRA]
ویرایش : 3
عنوان ترجمه شده به فارسی : استخراج مجموعه داده های عظیم [Team-IRA]
سری :
نویسندگان : Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman
ناشر : Cambridge University Press
سال نشر : 2020
تعداد صفحات : 567
ISBN (شابک) : 1108476341 , 9781108476348
زبان کتاب : English
فرمت کتاب : pdf
حجم کتاب : 5 مگابایت

بعد از تکمیل فرایند پرداخت لینک دانلود کتاب ارائه خواهد شد. درصورت ثبت نام و ورود به حساب کاربری خود قادر خواهید بود لیست کتاب های خریداری شده را مشاهده فرمایید.

فهرست مطالب :

Cover
Half title
Title page
Copyright information
Contents
Preface
1 Data Mining
1.1 What is Data Mining?
1.1.1 Modeling
1.1.2 Statistical Modeling
1.1.3 Machine Learning
1.1.4 Computational Approaches to Modeling
1.1.5 Summarization
1.1.6 Feature Extraction
1.2 Statistical Limits on Data Mining
1.2.1 Total Information Awareness
1.2.2 Bonferroni’s Principle
1.2.3 An Example of Bonferroni’s Principle
1.2.4 Exercises for Section 1.2
1.3 Things Useful to Know
1.3.1 Importance of Words in Documents
1.3.2 Hash Functions
1.3.3 Indexes
1.3.4 Secondary Storage
1.3.5 The Base of Natural Logarithms
1.3.6 Power Laws
1.3.7 Exercises for Section 1.3
1.4 Outline of the Book
1.5 Summary of Chapter 1
1.6 References for Chapter 1
2 MapReduce and the New Software Stack
2.1 Distributed File Systems
2.1.1 Physical Organization of Compute Nodes
2.1.2 Large-Scale File-System Organization
2.2 MapReduce
2.2.1 The Map Tasks
2.2.2 Grouping by Key
2.2.3 The Reduce Tasks
2.2.4 Combiners
2.2.5 Details of MapReduce Execution
2.2.6 Coping With Node Failures
2.2.7 Exercises for Section 2.2
2.3 Algorithms Using MapReduce
2.3.1 Matrix-Vector Multiplication by MapReduce
2.3.2 If the Vector v Cannot Fit in Main Memory
2.3.3 Relational-Algebra Operations
2.3.4 Computing Selections by MapReduce
2.3.5 Computing Projections by MapReduce
2.3.6 Union, Intersection, and Difference by MapReduce
2.3.7 Computing Natural Join by MapReduce
2.3.8 Grouping and Aggregation by MapReduce
2.3.9 Matrix Multiplication
2.3.10 Matrix Multiplication with One MapReduce Step
2.3.11 Exercises for Section 2.3
2.4 Extensions to MapReduce
2.4.1 Workflow Systems
2.4.2 Spark
2.4.3 Spark Implementation
2.4.4 TensorFlow
2.4.5 Recursive Extensions to MapReduce
2.4.6 Bulk-Synchronous Systems
2.4.7 Exercises for Section 2.4
2.5 The Communication-Cost Model
2.5.1 Communication Cost for Task Networks
2.5.2 Wall-Clock Time
2.5.3 Multiway Joins
2.5.4 Exercises for Section 2.5
2.6 Complexity Theory for MapReduce
2.6.1 Reducer Size and Replication Rate
2.6.2 An Example: Similarity Joins
2.6.3 A Graph Model for MapReduce Problems
2.6.4 Mapping Schemas
2.6.5 When Not All Inputs Are Present
2.6.6 Lower Bounds on Replication Rate
2.6.7 Case Study: Matrix Multiplication
2.6.8 Exercises for Section 2.6
2.7 Summary of Chapter 2
2.8 References for Chapter 2
3 Finding Similar Items
3.1 Applications of Set Similarity
3.1.1 Jaccard Similarity of Sets
3.1.2 Similarity of Documents
3.1.3 Collaborative Filtering as a Similar-Sets Problem
3.1.4 Exercises for Section 3.1
3.2 Shingling of Documents
3.2.1 k-Shingles
3.2.2 Choosing the Shingle Size
3.2.3 Hashing Shingles
3.2.4 Shingles Built from Words
3.2.5 Exercises for Section 3.2
3.3 Similarity-Preserving Summaries of Sets
3.3.1 Matrix Representation of Sets
3.3.2 Minhashing
3.3.3 Minhashing and Jaccard Similarity
3.3.4 Minhash Signatures
3.3.5 Computing Minhash Signatures in Practice
3.3.6 Speeding Up Minhashing
3.3.7 Speedup Using Hash Functions
3.3.8 Exercises for Section 3.3
3.4 Locality-Sensitive Hashing for Documents
3.4.1 LSH for Minhash Signatures
3.4.2 Analysis of the Banding Technique
3.4.3 Combining the Techniques
3.4.4 Exercises for Section 3.4
3.5 Distance Measures
3.5.1 Definition of a Distance Measure
3.5.2 Euclidean Distances
3.5.3 Jaccard Distance
3.5.4 Cosine Distance
3.5.5 Edit Distance
3.5.6 Hamming Distance
3.5.7 Exercises for Section 3.5
3.6 The Theory of Locality-Sensitive Functions
3.6.1 Locality-Sensitive Functions
3.6.2 Locality-Sensitive Families for Jaccard Distance
3.6.3 Amplifying a Locality-Sensitive Family
3.6.4 Exercises for Section 3.6
3.7 LSH Families for Other Distance Measures
3.7.1 LSH Families for Hamming Distance
3.7.2 Random Hyperplanes and the Cosine Distance
3.7.3 Sketches
3.7.4 LSH Families for Euclidean Distance
3.7.5 More LSH Families for Euclidean Spaces
3.7.6 Exercises for Section 3.7
3.8 Applications of Locality-Sensitive Hashing
3.8.1 Entity Resolution
3.8.2 An Entity-Resolution Example
3.8.3 Validating Record Matches
3.8.4 Matching Fingerprints
3.8.5 A LSH Family for Fingerprint Matching
3.8.6 Similar News Articles
3.8.7 Exercises for Section 3.8
3.9 Methods for High Degrees of Similarity
3.9.1 Finding Identical Items
3.9.2 Representing Sets as Strings
3.9.3 Length-Based Filtering
3.9.4 Prefix Indexing
3.9.5 Using Position Information
3.9.6 Using Position and Length in Indexes
3.9.7 Exercises for Section 3.9
3.10 Summary of Chapter 3
3.11 References for Chapter 3
4 Mining Data Streams
4.1 The Stream Data Model
4.1.1 A Data-Stream-Management System
4.1.2 Examples of Stream Sources
4.1.3 Stream Queries
4.1.4 Issues in Stream Processing
4.2 Sampling Data in a Stream
4.2.1 A Motivating Example
4.2.2 Obtaining a Representative Sample
4.2.3 The General Sampling Problem
4.2.4 Varying the Sample Size
4.2.5 Exercises for Section 4.2
4.3 Filtering Streams
4.3.1 A Motivating Example
4.3.2 The Bloom Filter
4.3.3 Analysis of Bloom Filtering
4.3.4 Exercises for Section 4.3
4.4 Counting Distinct Elements in a Stream
4.4.1 The Count-Distinct Problem
4.4.2 The Flajolet–Martin Algorithm
4.4.3 Combining Estimates
4.4.4 Space Requirements
4.4.5 Exercises for Section 4.4
4.5 Estimating Moments
4.5.1 Definition of Moments
4.5.2 The Alon–Matias–Szegedy Algorithm for Second Moments
4.5.3 Why the Alon–Matias–Szegedy Algorithm Works
4.5.4 Higher-Order Moments
4.5.5 Dealing With Infinite Streams
4.5.6 Exercises for Section 4.5
4.6 Counting Ones in a Window
4.6.1 The Cost of Exact Counts
4.6.2 The Datar–Gionis–Indyk–Motwani Algorithm
4.6.3 Storage Requirements for the DGIM Algorithm
4.6.4 Query Answering in the DGIM Algorithm
4.6.5 Maintaining the DGIM Conditions
4.6.6 Reducing the Error
4.6.7 Extensions to the Counting of Ones
4.6.8 Exercises for Section 4.6
4.7 Decaying Windows
4.7.1 The Problem of Most-Common Elements
4.7.2 Definition of the Decaying Window
4.7.3 Finding the Most Popular Elements
4.8 Summary of Chapter 4
4.9 References for Chapter 4
5 Link Analysis
5.1 PageRank
5.1.1 Early Search Engines and Term Spam
5.1.2 Definition of PageRank
5.1.3 Structure of the Web
5.1.4 Avoiding Dead Ends
5.1.5 Spider Traps and Taxation
5.1.6 Using PageRank in a Search Engine
5.1.7 Exercises for Section 5.1
5.2 Efficient Computation of PageRank
5.2.1 Representing Transition Matrices
5.2.2 PageRank Iteration Using MapReduce
5.2.3 Use of Combiners to Consolidate the Result Vector
5.2.4 Representing Blocks of the Transition Matrix
5.2.5 Other Efficient Approaches to PageRank Iteration
5.2.6 Exercises for Section 5.2
5.3 Topic-Sensitive PageRank
5.3.1 Motivation for Topic-Sensitive Page Rank
5.3.2 Biased Random Walks
5.3.3 Using Topic-Sensitive PageRank
5.3.4 Inferring Topics from Words
5.3.5 Exercises for Section 5.3
5.4 Link Spam
5.4.1 Architecture of a Spam Farm
5.4.2 Analysis of a Spam Farm
5.4.3 Combating Link Spam
5.4.4 TrustRank
5.4.5 Spam Mass
5.4.6 Exercises for Section 5.4
5.5 Hubs and Authorities
5.5.1 The Intuition Behind HITS
5.5.2 Formalizing Hubbiness and Authority
5.5.3 Exercises for Section 5.5
5.6 Summary of Chapter 5
5.7 References for Chapter 5
6 Frequent Itemsets
6.1 The Market-Basket Model
6.1.1 Definition of Frequent Itemsets
6.1.2 Applications of Frequent Itemsets
6.1.3 Association Rules
6.1.4 Finding Association Rules with High Confidence
6.1.5 Exercises for Section 6.1
6.2 Market Baskets and the A-Priori Algorithm
6.2.1 Representation of Market-Basket Data
6.2.2 Use of Main Memory for Itemset Counting
6.2.3 Monotonicity of Itemsets
6.2.4 Tyranny of Counting Pairs
6.2.5 The A-Priori Algorithm
6.2.6 A-Priori for All Frequent Itemsets
6.2.7 Exercises for Section 6.2
6.3 Handling Larger Datasets in Main Memory
6.3.1 The Algorithm of Park, Chen, and Yu
6.3.2 The Multistage Algorithm
6.3.3 The Multihash Algorithm
6.3.4 Exercises for Section 6.3
6.4 Limited-Pass Algorithms
6.4.1 The Simple, Randomized Algorithm
6.4.2 Avoiding Errors in Sampling Algorithms
6.4.3 The Algorithm of Savasere, Omiecinski, and Navathe
6.4.4 The SON Algorithm and MapReduce
6.4.5 Toivonen’s Algorithm
6.4.6 Why Toivonen’s Algorithm Works
6.4.7 Exercises for Section 6.4
6.5 Counting Frequent Items in a Stream
6.5.1 Sampling Methods for Streams
6.5.2 Frequent Itemsets in Decaying Windows
6.5.3 Hybrid Methods
6.5.4 Exercises for Section 6.5
6.6 Summary of Chapter 6
6.7 References for Chapter 6
7 Clustering
7.1 Introduction to Clustering Techniques
7.1.1 Points, Spaces, and Distances
7.1.2 Clustering Strategies
7.1.3 The Curse of Dimensionality
7.1.4 Exercises for Section 7.1
7.2 Hierarchical Clustering
7.2.1 Hierarchical Clustering in a Euclidean Space
7.2.2 Efficiency of Hierarchical Clustering
7.2.3 Alternative Rules for Controlling Hierarchical Clustering
7.2.4 Hierarchical Clustering in Non-Euclidean Spaces
7.2.5 Exercises for Section 7.2
7.3 K-means Algorithms
7.3.1 K-Means Basics
7.3.2 Initializing Clusters for K-Means
7.3.3 Picking the Right Value of k
7.3.4 The Algorithm of Bradley, Fayyad, and Reina
7.3.5 Processing Data in the BFR Algorithm
7.3.6 Exercises for Section 7.3
7.4 The CURE Algorithm
7.4.1 Initialization in CURE
7.4.2 Completion of the CURE Algorithm
7.4.3 Exercises for Section 7.4
7.5 Clustering in Non-Euclidean Spaces
7.5.1 Representing Clusters in the GRGPF Algorithm
7.5.2 Initializing the Cluster Tree
7.5.3 Adding Points in the GRGPF Algorithm
7.5.4 Splitting and Merging Clusters
7.5.5 Exercises for Section 7.5
7.6 Clustering for Streams and Parallelism
7.6.1 The Stream-Computing Model
7.6.2 A Stream-Clustering Algorithm
7.6.3 Initializing Buckets
7.6.4 Merging Buckets
7.6.5 Answering Queries
7.6.6 Clustering in a Parallel Environment
7.6.7 Exercises for Section 7.6
7.7 Summary of Chapter 7
7.8 References for Chapter 7
8 Advertising on the Web
8.1 Issues in On-Line Advertising
8.1.1 Advertising Opportunities
8.1.2 Direct Placement of Ads
8.1.3 Issues for Display Ads
8.2 On-Line Algorithms
8.2.1 On-Line and Off-Line Algorithms
8.2.2 Greedy Algorithms
8.2.3 The Competitive Ratio
8.2.4 Exercises for Section 8.2
8.3 The Matching Problem
8.3.1 Matches and Perfect Matches
8.3.2 The Greedy Algorithm for Maximal Matching
8.3.3 Competitive Ratio for Greedy Matching
8.3.4 Exercises for Section 8.3
8.4 The Adwords Problem
8.4.1 History of Search Advertising
8.4.2 Definition of the Adwords Problem
8.4.3 The Greedy Approach to the Adwords Problem
8.4.4 The Balance Algorithm
8.4.5 A Lower Bound on Competitive Ratio for Balance
8.4.6 The Balance Algorithm with Many Bidders
8.4.7 The Generalized Balance Algorithm
8.4.8 Final Observations About the Adwords Problem
8.4.9 Exercises for Section 8.4
8.5 Adwords Implementation
8.5.1 Matching Bids and Search Queries
8.5.2 More Complex Matching Problems
8.5.3 A Matching Algorithm for Documents and Bids
8.6 Summary of Chapter 8
8.7 References for Chapter 8
9 Recommendation Systems
9.1 A Model for Recommendation Systems
9.1.1 The Utility Matrix
9.1.2 The Long Tail
9.1.3 Applications of Recommendation Systems
9.1.4 Populating the Utility Matrix
9.2 Content-Based Recommendations
9.2.1 Item Profiles
9.2.2 Discovering Features of Documents
9.2.3 Obtaining Item Features From Tags
9.2.4 Representing Item Profiles
9.2.5 User Profiles
9.2.6 Recommending Items to Users Based on Content
9.2.7 Classification Algorithms
9.2.8 Exercises for Section 9.2
9.3 Collaborative Filtering
9.3.1 Measuring Similarity
9.3.2 The Duality of Similarity
9.3.3 Clustering Users and Items
9.3.4 Exercises for Section 9.3
9.4 Dimensionality Reduction
9.4.1 UV-Decomposition
9.4.2 Root-Mean-Square Error
9.4.3 Incremental Computation of a UV-Decomposition
9.4.4 Optimizing an Arbitrary Element
9.4.5 Building a Complete UV-Decomposition Algorithm
9.4.6 Exercises for Section 9.4
9.5 The Netflix Challenge
9.6 Summary of Chapter 9
9.7 References for Chapter 9
10 Mining Social-Network Graphs
10.1 Social Networks as Graphs
10.1.1 What is a Social Network?
10.1.2 Social Networks as Graphs
10.1.3 Varieties of Social Networks
10.1.4 Graphs With Several Node Types
10.1.5 Exercises for Section 10.1
10.2 Clustering of Social-Network Graphs
10.2.1 Distance Measures for Social-Network Graphs
10.2.2 Applying Standard Clustering Methods
10.2.3 Betweenness
10.2.4 The Girvan–Newman Algorithm
10.2.5 Using Betweenness to Find Communities
10.2.6 Exercises for Section 10.2
10.3 Direct Discovery of Communities
10.3.1 Finding Cliques
10.3.2 Complete Bipartite Graphs
10.3.3 Finding Complete Bipartite Subgraphs
10.3.4 Why Complete Bipartite Graphs Must Exist
10.3.5 Exercises for Section 10.3
10.4 Partitioning of Graphs
10.4.1 What Makes a Good Partition?
10.4.2 Normalized Cuts
10.4.3 Some Matrices That Describe Graphs
10.4.4 Eigenvalues of the Laplacian Matrix
10.4.5 Alternative Partitioning Methods
10.4.6 Exercises for Section 10.4
10.5 Finding Overlapping Communities
10.5.1 The Nature of Communities
10.5.2 Maximum-Likelihood Estimation
10.5.3 The Affiliation-Graph Model
10.5.4 Discrete Optimization of Community Assignments
10.5.5 Avoiding the Use of Discrete Membership Changes
10.5.6 Exercises for Section 10.5
10.6 Simrank
10.6.1 Random Walkers on a Social Graph
10.6.2 Random Walks with Restart
10.6.3 Approximate Simrank
10.6.4 Why Approximate Simrank Works
10.6.5 Application of Simrank to Finding Communities
10.6.6 Exercises for Section 10.6
10.7 Counting Triangles
10.7.1 Why Count Triangles?
10.7.2 An Algorithm for Finding Triangles
10.7.3 Optimality of the Triangle-Finding Algorithm
10.7.4 Finding Triangles Using MapReduce
10.7.5 Using Fewer Reduce Tasks
10.7.6 Exercises for Section 10.7
10.8 Neighborhood Properties of Graphs
10.8.1 Directed Graphs and Neighborhoods
10.8.2 The Diameter of a Graph
10.8.3 Transitive Closure and Reachability
10.8.4 Reachability Via MapReduce
10.8.5 Semi-naive Evaluation
10.8.6 Linear Transitive Closure
10.8.7 Transitive Closure by Recursive Doubling
10.8.8 Smart Transitive Closure
10.8.9 Comparison of Methods
10.8.10 Transitive Closure by Graph Reduction
10.8.11 Approximating the Sizes of Neighborhoods
10.8.12 Exercises for Section 10.8
10.9 Summary of Chapter 10
10.10 References for Chapter 10
11 Dimensionality Reduction
11.1 Eigenvalues and Eigenvectors of Symmetric Matrices
11.1.1 Definitions
11.1.2 Computing Eigenvalues and Eigenvectors
11.1.3 Finding Eigenpairs by Power Iteration
11.1.4 The Matrix of Eigenvectors
11.1.5 Exercises for Section 11.1
11.2 Principal-Component Analysis
11.2.1 An Illustrative Example
11.2.2 Using Eigenvectors for Dimensionality Reduction
11.2.3 The Matrix of Distances
11.2.4 Exercises for Section 11.2
11.3 Singular-Value Decomposition
11.3.1 Definition of SVD
11.3.2 Interpretation of SVD
11.3.3 Dimensionality Reduction Using SVD
11.3.4 Why Zeroing Low Singular Values Works
11.3.5 Querying Using Concepts
11.3.6 Computing the SVD of a Matrix
11.3.7 Exercises for Section 11.3
11.4 CUR Decomposition
11.4.1 Definition of CUR
11.4.2 Choosing Rows and Columns Properly
11.4.3 Constructing the Middle Matrix
11.4.4 The Complete CUR Decomposition
11.4.5 Eliminating Duplicate Rows and Columns
11.4.6 Exercises for Section 11.4
11.5 Summary of Chapter 11
11.6 References for Chapter 11
12 Large-Scale Machine Learning
12.1 The Machine-Learning Model
12.1.1 Training Sets
12.1.2 Some Illustrative Examples
12.1.3 Approaches to Machine Learning
12.1.4 Machine-Learning Architecture
12.1.5 Exercises for Section 12.1
12.2 Perceptrons
12.2.1 Training a Perceptron with Zero Threshold
12.2.2 Convergence of Perceptrons
12.2.3 The Winnow Algorithm
12.2.4 Allowing the Threshold to Vary
12.2.5 Multiclass Perceptrons
12.2.6 Transforming the Training Set
12.2.7 Problems With Perceptrons
12.2.8 Parallel Implementation of Perceptrons
12.2.9 Exercises for Section 12.2
12.3 Support-Vector Machines
12.3.1 The Mechanics of an SVM
12.3.2 Normalizing the Hyperplane
12.3.3 Finding Optimal Approximate Separators
12.3.4 SVM Solutions by Gradient Descent
12.3.5 Stochastic Gradient Descent
12.3.6 Parallel Implementation of SVM
12.3.7 Exercises for Section 12.3
12.4 Learning from Nearest Neighbors
12.4.1 The Framework for Nearest-Neighbor Calculations
12.4.2 Learning with One Nearest Neighbor
12.4.3 Learning One-Dimensional Functions
12.4.4 Kernel Regression
12.4.5 Dealing with High-Dimensional Euclidean Data
12.4.6 Dealing with Non-Euclidean Distances
12.4.7 Exercises for Section 12.4
12.5 Decision Trees
12.5.1 Using a Decision Tree
12.5.2 Impurity Measures
12.5.3 Designing a Decision-Tree Node
12.5.4 Selecting a Test Using a Numerical Feature
12.5.5 Selecting a Test Using a Categorical Feature
12.5.6 Parallel Design of Decision Trees
12.5.7 Node Pruning
12.5.8 Decision Forests
12.5.9 Exercises for Section 12.5
12.6 Comparison of Learning Methods
12.7 Summary of Chapter 12
12.8 References for Chapter 12
13 Neural Nets and Deep Learning
13.1 Introduction to Neural Nets
13.1.1 Neural Nets, in General
13.1.2 Interconnections Among Nodes
13.1.3 Convolutional Neural Networks
13.1.4 Design Issues for Neural Nets
13.1.5 Exercises for Section 13.1
13.2 Dense Feedforward Networks
13.2.1 Linear Algebra Notation
13.2.2 Activation Functions
13.2.3 The Sigmoid
13.2.4 The Hyperbolic Tangent
13.2.5 Softmax
13.2.6 Recified Linear Unit
13.2.7 Loss Functions
13.2.8 Regression Loss
13.2.9 Classification Loss
13.2.10 Exercises for Section 13.2
13.3 Backpropagation and Gradient Descent
13.3.1 Compute Graphs
13.3.2 Gradients, Jacobians, and the Chain Rule
13.3.3 The Backpropagation Algorithm
13.3.4 Iterating Gradient Descent
13.3.5 Tensors
13.3.6 Exercises for Section 13.3
13.4 Convolutional Neural Networks
13.4.1 Convolutional Layers
13.4.2 Convolution and Cross-Correlation
13.4.3 Pooling Layers
13.4.4 CNN Architecture
13.4.5 Implementation and Training
13.4.6 Exercises for Section 13.4
13.5 Recurrent Neural Networks
13.5.1 Training RNNs
13.5.2 Vanishing and Exploding Gradients
13.5.3 Long Short-Term Memory (LSTM)
13.5.4 Exercises for Section 13.5
13.6 Regularization
13.6.1 Norm Penalties
13.6.2 Dropout
13.6.3 Early Stopping
13.6.4 Dataset Augmentation
13.7 Summary of Chapter 13
13.8 References for Chapter 13
Index