توضیحاتی در مورد کتاب Elements of Dimensionality Reduction and Manifold Learning
نام کتاب : Elements of Dimensionality Reduction and Manifold Learning
ویرایش : 1
عنوان ترجمه شده به فارسی : عناصر کاهش ابعاد و یادگیری چندگانه
سری :
نویسندگان : Benyamin Ghojogh, Mark Crowley, Fakhri Karray, Ali Ghodsi
ناشر : Springer Nature Switzerland AG
سال نشر : 2023
تعداد صفحات : 617
ISBN (شابک) : 9783031106019 , 9783031106026
زبان کتاب : English
فرمت کتاب : pdf
حجم کتاب : 9 مگابایت
بعد از تکمیل فرایند پرداخت لینک دانلود کتاب ارائه خواهد شد. درصورت ثبت نام و ورود به حساب کاربری خود قادر خواهید بود لیست کتاب های خریداری شده را مشاهده فرمایید.
فهرست مطالب :
Preface
How This Book Can Be Useful
Motivation
Targeted Readers
Corresponding Courses
Organization of the Book
Preliminaries and Background
Spectral Dimensionality Reduction
Probabilistic Dimensionality Reduction
Neural Network-Based Dimensionality Reduction
Other Related Books
References
Acknowledgment
Contents
1 Introduction
1.1 Introduction
1.1.1 Dataset
1.1.2 Manifold Hypothesis
1.1.3 Feature Engineering
1.2 Dimensionality Reduction and Manifold Learning
1.2.1 Spectral Dimensionality Reduction
1.2.2 Probabilistic Dimensionality Reduction
1.2.3 Neural Network-Based Dimensionality Reduction
1.2.4 Other Types of Dimensionality Reduction
1.3 A Brief History of Dimensionality Reduction
1.3.1 History of Spectral Dimensionality Reduction
1.3.2 History of Probabilistic Dimensionality Reduction
1.3.3 History of Neural Network-Based Dimensionality Reduction
1.4 Chapter Summary
References
Part I Preliminaries and Background
2 Background on Linear Algebra
2.1 Introduction
2.2 The Centering Matrix
2.3 Linear Projection
2.3.1 A Projection Point of View
2.3.2 Projection and Reconstruction
2.3.2.1 Projection and Reconstruction for Noncentered Data
2.3.2.2 Projection and Reconstruction for Centered Data
2.4 Rayleigh-Ritz Quotient
2.4.1 Generalized Rayleigh-Ritz Quotient
2.5 Eigenvalue and Generalized Eigenvalue Problems
2.5.1 Introducing Eigenvalue and Generalized Eigenvalue Problems
2.5.1.1 Eigenvalue Problem
2.5.1.2 Generalized Eigenvalue Problem
2.5.2 Generalized Eigenvalue Optimization
2.5.3 Solution to the Eigenvalue problem
2.5.4 Solution to Generalized Eigenvalue Problem
2.5.4.1 The Simplified Solution
2.5.4.2 The Rigorous Solution
2.6 Singular Value Decomposition
2.7 Chapter Summary
Appendix 2.1: Proof of Why the Lagrange Multiplier is Diagonal in the Eigenvalue Problem
Discussion on the Number of Constraints
Proof of Being Diagonal
Appendix 2.2: Discussion of the Sorting of Eigenvectors and Eigenvalues
References
3 Background on Kernels
3.1 Introduction
3.2 Reproducing Kernel Hilbert Space
3.2.1 Mercer Kernel and Gram Matrix
3.2.2 RKHS as a Hilbert Space
3.2.3 Reproducing Property
3.2.4 Mercer\'s Theorem
3.3 Feature Map and Pulling Function
3.4 Well-Known Kernel Functions
3.4.1 Frequently Used Kernels
3.4.2 Kernel Construction from Distance Metric
3.5 Kernel Centering and Normalization
3.5.1 Kernel Centering
3.5.1.1 Centering the Kernel of Training Data
3.5.1.2 Centering the Kernel Between Training and Out-of-sample Data
3.5.2 Kernel Normalization
3.6 Eigenfunctions
3.6.1 Eigenfunctions
3.6.2 Use of Eigenfunctions for Spectral Embedding
3.7 Kernelization Techniques
3.7.1 Kernelization by Kernel Trick
3.7.2 Kernelization by Representation Theory
3.7.2.1 Kernelization for Vector Solution
3.7.2.2 Kernelization for Matrix Solution
3.8 Difference Measures Using Kernels
3.8.1 Distance in RKHS
3.8.2 Hilbert-Schmidt Independence Criterion (HSIC)
3.8.3 Maximum Mean Discrepancy (MMD)
3.9 Factorization of Kernel and the Nyström Method
3.9.1 Singular Value and Eigenvalue Decompositions of Kernel
3.9.2 Nyström Method for Approximationof Eigenfunctions
3.9.3 Nyström Method for Kernel Completion and Approximation
3.9.4 Use of Nyström Approximation for Landmark Spectral Embedding
3.10 Chapter Summary
References
4 Background on Optimization
4.1 Introduction
4.2 Notations and Preliminaries
4.2.1 Preliminaries on Sets and Norms
4.2.2 Preliminaries on Optimization
4.2.3 Preliminaries on Derivative
4.3 Optimization Problem
4.4 Karush-Kuhn-Tucker Conditions
4.4.1 The Lagrangian Function
4.4.1.1 Lagrangian and Dual Variables
4.4.1.2 Sign of Terms in Lagrangian
4.4.1.3 Interpretation of Lagrangian
4.4.1.4 Lagrange Dual Function
4.4.2 The Dual Problem, Weak and Strong Duality, and Slater\'s Condition
4.4.3 KKT Conditions
4.5 First-Order Optimization: Gradient Methods
4.5.1 Gradient Descent
4.5.1.1 Step of Update
4.5.1.2 Line-Search
4.5.1.3 Backtracking Line-Search
4.5.1.4 Convergence Criterion
4.5.2 Stochastic Gradient Methods
4.5.2.1 Stochastic Gradient Descent
4.5.2.2 Minibatch Stochastic Gradient Descent
4.5.3 Projected Gradient Method
4.6 Second-Order Optimization: Newton\'s Method
4.6.1 Newton\'s Method from the Newton-Raphson Root Finding Method
4.6.2 Newton\'s Method for Unconstrained Optimization
4.6.3 Newton\'s Method for Equality-Constrained Optimization
4.6.4 Interior-Point and Barrier Methods: Newton\'s Method for Inequality-Constrained Optimization
4.7 Distributed Optimization
4.7.1 Alternating Optimization
4.7.2 Dual Ascent and Dual Decomposition Methods
4.7.3 Augmented Lagrangian Method (Method of Multipliers)
4.7.4 Alternating Direction Method of Multipliers
4.7.4.1 ADMM Algorithm
4.7.4.2 Simplifying Equations in ADMM
4.7.5 ADMM Algorithm for General Optimization Problems and Any Number of Variables
4.7.5.1 Distributed Optimization
4.7.5.2 Making Optimization Problem Distributed
4.8 Chapter Summary
References
Part II Spectral Dimensionality Reduction
5 Principal Component Analysis
5.1 Introduction
5.2 Principal Component Analysis
5.2.1 Projection and Reconstruction in PCA
5.2.2 PCA Using Eigen-Decomposition
5.2.2.1 Projection onto One Direction
5.2.2.2 Projection onto Span of Several Directions
5.2.3 Properties of U
5.2.3.1 Rank of the Covariance Matrix
5.2.3.2 Truncating U
5.2.4 Reconstruction Error in PCA
5.2.4.1 Reconstruction in Linear Projection
5.2.4.2 Reconstruction in Autoencoder
5.2.5 PCA Using Singular Value Decomposition
5.2.6 Determining the Number of Principal Directions
5.3 Dual Principal Component Analysis
5.3.1 Projection
5.3.2 Reconstruction
5.3.3 Out-of-Sample Projection
5.3.4 Out-of-Sample Reconstruction
5.3.5 Why Is Dual PCA Useful?
5.4 Kernel Principal Component Analysis
5.4.1 Kernels and Hilbert Space
5.4.2 Projection
5.4.3 Reconstruction
5.4.4 Out-of-Sample Projection
5.4.5 Out-of-Sample Reconstruction
5.4.6 Why Is Kernel PCA Useful?
5.5 Supervised Principal Component Analysis Using Scoring
5.6 Supervised Principal Component Analysis Using HSIC
5.6.1 Supervised PCA
5.6.2 PCA Is a Special Case of SPCA!
5.6.3 Dual Supervised PCA
5.6.4 Kernel Supervised PCA
5.6.4.1 Kernel SPCA Using Direct SPCA
5.6.4.2 Kernel SPCA Using Dual SPCA
5.7 Eigenfaces
5.7.1 Projection Directions of Facial Images
5.7.2 Projection of Facial Images
5.7.3 Reconstruction of Facial Images
5.8 Chapter Summary
References
6 Fisher Discriminant Analysis
6.1 Introduction
6.2 Projection and Reconstruction
6.2.1 Projection Formulation
6.3 Fisher Discriminant Analysis
6.3.1 One-Dimensional Subspace
6.3.1.1 Scatters in Two-Class Case
6.3.1.2 Scatters in Multiclass Case: Variant 1
6.3.1.3 Scatters in Multiclass Case: Variant 2
6.3.1.4 Fisher Subspace: Variant 1
6.3.1.5 Fisher Subspace: Variant 2
6.3.2 Multidimensional Subspace
6.3.3 Discussion of the Dimensionality of the Fisher Subspace
6.4 Interpretation of FDA: The Example of a Man with Weak Eyes
6.5 Robust Fisher Discriminant Analysis
6.6 Comparison of FDA and PCA Directions
6.7 On Equivalency of FDA and LDA
6.8 Kernel Fisher Discriminant Analysis
6.8.1 Kernels and Hilbert Space
6.8.2 One-Dimensional Subspace
6.8.2.1 Scatters in Two-Class Case
6.8.2.2 Scatters in Multiclass Case: Variant 1
6.8.2.3 Scatters in Multiclass Case: Variant 2
6.8.3 Multidimensional Subspace
6.8.4 Discussion of the Dimensionality of the Kernel Fisher Subspace
6.9 Fisherfaces
6.9.1 Projection Directions of Facial Images
6.9.2 Projection of Facial Images
6.9.3 Reconstruction of Facial Images
6.9.4 Out-of-Sample Projection of Facial Images
6.10 Chapter Summary
References
7 Multidimensional Scaling, Sammon Mapping, and Isomap
7.1 Introduction
7.2 Multidimensional Scaling
7.2.1 Classical Multidimensional Scaling
7.2.1.1 Classical MDS with Euclidean Distance
7.2.1.2 Generalized Classical MDS (Kernel Classical MDS)
7.2.1.3 Equivalence of PCA and Kernel PCA with Classical MDS and Generalized Classical MDS, Respectively
7.2.2 Metric Multidimensional Scaling
7.2.3 Nonmetric Multidimensional Scaling
7.3 Sammon Mapping
7.4 Isometric Mapping (Isomap)
7.4.1 Isomap
7.4.1.1 Geodesic Distance
7.4.1.2 Isomap Formulation
7.4.2 Kernel Isomap
7.5 Out-of-Sample Extensions for MDS and Isomap
7.5.1 Out of Sample for Isomap and MDS Using Eigenfunctions
7.5.2 Out of Sample for Isomap, Kernel Isomap, and MDS Using Kernel Mapping
7.6 Landmark MDS and Landmark Isomap for Big Data Embedding
7.6.1 Using Kernel Approximation in Landmark MDS
7.6.2 Using Distance Matrix in Landmark MDS
7.7 Chapter Summary
References
8 Locally Linear Embedding
8.1 Introduction
8.2 Locally Linear Embedding
8.2.1 k-Nearest Neighbours
8.2.2 Linear Reconstruction by the Neighbours
8.2.3 Linear Embedding
8.2.4 Additional Notes on LLE
8.2.4.1 Inverse Locally Linear Embedding
8.2.4.2 Feature Fusion in LLE
8.3 Kernel Locally Linear Embedding
8.3.1 k-Nearest Neighbours
8.3.2 Linear Reconstruction by the Neighbours
8.3.3 Linear Embedding
8.4 Out-of-Sample Embedding in LLE
8.4.1 Out-of-Sample Embedding Using Linear Reconstruction
8.4.2 Out-of-Sample Embedding Using Eigenfunctions
8.4.3 Out-of-sample Embedding Using Kernel Mapping
8.5 Incremental LLE
8.6 Landmark Locally Linear Embedding for Big Data Embedding
8.6.1 Landmark LLE Using Nyström Approximation
8.6.2 Landmark LLE Using Locally Linear Landmarks
8.7 Parameter Selection of the Number of Neighbours in LLE
8.7.1 Parameter Selection Using Residual Variance
8.7.2 Parameter Selection Using Procrustes Statistics
8.7.3 Parameter Selection Using Preservation Neighbourhood Error
8.7.4 Parameter Selection Using Local Neighbourhood Selection
8.8 Supervised and Semisupervised LLE
8.8.1 Supervised LLE
8.8.2 Enhanced Supervised LLE
8.8.3 Supervised LLE Projection
8.8.4 Probabilistic Supervised LLE
8.8.5 Semi-Supervised LLE
8.8.6 Supervised Guided LLE
8.8.6.1 Seeing LLE as Kernel PCA
8.8.6.2 Interpreting LLE Using HSIC
8.8.6.3 Guiding LLE Using Labels
8.8.7 Other Supervised Algorithms
8.9 Robust Locally Linear Embedding
8.9.1 Robust LLE Using the Least Squares Problem
8.9.2 Robust LLE Using Penalty Functions
8.9.2.1 RLLE with 2 Norm Penalty
8.9.2.2 RLLE with Elastic-Net Penalty
8.10 Fusion of LLE with Other Dimensionality Reduction Methods
8.10.1 LLE with Geodesic Distances: Fusion of LLE with Isomap
8.10.2 Fusion of LLE with PCA
8.10.3 Fusion of LLE with FDA (or LDA)
8.10.4 Fusion of LLE with FDA and Graph Embedding: Discriminant LLE
8.11 Weighted Locally Linear Embedding
8.11.1 Weighted LLE for Deformed Distributed Data
8.11.2 Weighted LLE Using Probability of Occurrence
8.11.3 Supervised LLE by Adjusting Weights
8.11.4 Modified Locally Linear Embedding
8.11.5 Iterative Locally Linear Embedding
8.12 Chapter Summary
References
9 Laplacian-Based Dimensionality Reduction
9.1 Introduction
9.2 Laplacian Matrix and Its Interpretation
9.2.1 Adjacency Matrix
9.2.2 Laplacian Matrix
9.2.3 Interpretation of Laplacian
9.2.4 Eigenvalues of Laplacian Matrix
9.2.5 Convergence of Laplacian
9.3 Spectral Clustering
9.3.1 The Spectral Clustering Algorithm
9.3.1.1 Adjacency Matrix
9.3.1.2 The Cut
9.3.1.3 Optimization of Spectral Clustering
9.3.1.4 Solution to Optimization
9.3.1.5 Extension of Spectral Clustering to Multiple Clusters
9.3.1.6 Optimization Approach 2
9.3.2 Other Improvements Over Spectral Clustering
9.4 Laplacian Eigenmap
9.4.1 Laplacian Eigenmap
9.4.1.1 Adjacency Matrix
9.4.1.2 Interpretation of Laplacian Eigenmap
9.4.1.3 Optimization Approach 1
9.4.1.4 Optimization Approach 2
9.4.2 Out-of-Sample Extension for Laplacian Eigenmap
9.4.2.1 Embedding Using Eigenfunctions
9.4.2.2 Out-of-Sample Embedding
9.4.3 Other Improvements Over the Laplacian Eigenmap
9.5 Locality Preserving Projection
9.5.1 Locality Preserving Projection
9.5.1.1 One-Dimensional Subspace
9.5.1.2 Multidimensional Subspace
9.5.2 Kernel Locality Preserving Projection
9.5.3 Other Improvements Over Locality Preserving Projection
9.6 Graph Embedding
9.6.1 Direct Graph Embedding
9.6.2 Linearized Graph Embedding
9.6.3 Kernelized Graph Embedding
9.6.4 Special Cases of Graph Embedding
9.6.4.1 Laplacian Eigenmap, LPP, and Kernel LPP
9.6.4.2 PCA and Kernel PCA
9.6.4.3 FDA and Kernel FDA
9.6.4.4 MDS and Isomap
9.6.4.5 LLE
9.6.5 Other Improvements Over Graph embedding
9.7 Diffusion Map
9.7.1 Discrete Time Markov Chain
9.7.2 The Optimization Problem
9.7.3 Diffusion Distance
9.7.4 Other Improvements Over Diffusion maps
9.8 Chapter Summary
References
10 Unified Spectral Framework and Maximum Variance Unfolding
10.1 Introduction
10.2 Unified Framework for Spectral Methods
10.2.1 Learning Eigenfunctions
10.2.2 Unified Framework as Kernel PCA
10.2.3 Summary of Kernels in Spectral Methods
10.2.4 Generalized Embedding
10.3 Kernel Learning for Transduction
10.4 Maximum Variance Unfolding (or Semidefinite Embedding)
10.4.1 Intuitions and Comparison with Kernel PCA
10.4.2 Local Isometry
10.4.3 Centering
10.4.4 Positive Semidefiniteness
10.4.5 Manifold Unfolding
10.4.6 Spectral Embedding
10.5 Supervised Maximum Variance Unfolding
10.5.1 Supervised MVU Using kNN Within Classes
10.5.2 Supervised MVU by Classwise Unfolding
10.5.3 Supervised MVU by Fisher Criterion
10.5.4 Supervised MVU by Coloured MVU
10.6 Out-of-Sample Extension of MVU
10.6.1 Out-of-Sample Extension Using Eigenfunctions
10.6.2 Out-of-Sample Extension Using Kernel Mapping
10.7 Other Variants of Maximum Variance Unfolding
10.7.1 Action Respecting Embedding
10.7.2 Relaxed Maximum Variance Unfolding
10.7.2.1 Short Circuits in the kNN Graph
10.7.2.2 Rescaling Local Distances
10.7.3 Landmark Maximum Variance Unfolding for Big Data
10.7.4 Other Improvements Over Maximum Variance Unfolding and Kernel Learning
10.8 Chapter Summary
References
11 Spectral Metric Learning
11.1 Introduction
11.2 Generalized Mahalanobis Distance Metric
11.2.1 Distance Metric
11.2.2 Mahalanobis Distance
11.2.3 Generalized Mahalanobis Distance
11.2.4 The Main Idea of Metric Learning
11.3 Spectral Methods Using Scatters
11.3.1 The First Spectral Method
11.3.1.1 Formulating as Semidefinite Programming
11.3.1.2 Relevant to Fisher Discriminant Analysis
11.3.1.3 Relevant Component Analysis (RCA)
11.3.1.4 Discriminative Component Analysis (DCA)
11.3.1.5 High-Dimensional Discriminative Component Analysis
11.3.1.6 Regularization by Locally Linear Embedding
11.3.1.7 Fisher-HSIC Multiview Metric Learning (FISH-MML)
11.3.2 Spectral Methods Using Hinge Loss
11.3.2.1 Large-Margin Metric Learning
11.3.2.2 Imbalanced Metric Learning (IML)
11.3.3 Locally Linear Metric Adaptation (LLMA)
11.3.4 Relevant to Support Vector Machine
11.3.5 Relevant to Multidimensional Scaling
11.3.6 Kernel Spectral Metric Learning
11.3.6.1 Using Eigenvalue Decomposition of the Kernel
11.3.6.2 Regularization by Locally Linear Embedding
11.3.6.3 Regularization by Laplacian
11.3.6.4 Kernel Discriminative Component Analysis
11.3.6.5 Relevant to Kernel Fisher Discriminant Analysis
11.3.6.6 Relevant to Kernel Support Vector Machine
11.3.7 Geometric Spectral Metric Learning
11.3.7.1 Geometric Mean Metric Learning
11.3.7.2 Low-Rank Geometric Mean Metric Learning
11.3.7.3 Geometric Mean Metric Learning for Partial Labels
11.3.7.4 Geometric Mean Metric Learning on SPD and Grassmannian Manifolds
11.3.7.5 Metric Learning on Stiefel and SPD Manifolds
11.3.7.6 Curvilinear Distance Metric Learning (CDML)
11.3.8 Adversarial Metric Learning (AML)
11.4 Chapter Summary
References
Part III Probabilistic Dimensionality Reduction
12 Factor Analysis and Probabilistic Principal Component Analysis
12.1 Introduction
12.2 Variational Inference
12.2.1 Evidence Lower Bound (ELBO)
12.2.2 Expectation Maximization
12.2.2.1 Background on Expectation Maximization
12.2.2.2 Expectation Maximization in Variational Inference
12.3 Factor Analysis
12.3.1 Background on Marginal Multivariate Gaussian Distribution
12.3.2 The Main Idea Behind Factor Analysis
12.3.3 The Factor Analysis Model
12.3.4 The Joint and Marginal Distributions in FactorAnalysis
12.3.5 Expectation Maximization in Factor Analysis
12.3.5.1 Maximization of Joint Likelihood
12.3.5.2 The E-Step in EM for Factor Analysis
12.3.5.3 The M-Step in EM for Factor Analysis
12.3.5.4 Summary of Factor Analysis Algorithm
12.4 Probabilistic Principal Component Analysis
12.4.1 Main Idea of Probabilistic PCA
12.4.2 MLE for Probabilistic PCA
12.4.2.1 MLE for Determining
12.4.2.2 MLE for Determining σ
12.4.2.3 Summary of MLE Formulas
12.4.3 Zero Noise Limit: PCA Is a Special Case of Probabilistic PCA
12.4.4 Other Variants of Probabilistic PCA
12.5 Chapter Summary
References
13 Probabilistic Metric Learning
13.1 Introduction
13.2 Collapsing Classes
13.2.1 Collapsing Classes in the Input Space
13.2.2 Collapsing Classes in the Feature Space
13.3 Neighbourhood Component Analysis Methods
13.3.1 Neighbourhood Component Analysis (NCA)
13.3.2 Regularized Neighbourhood Component Analysis
13.3.3 Fast Neighbourhood Component Analysis
13.3.3.1 Fast NCA
13.3.3.2 Kernel Fast NCA
13.4 Bayesian Metric Learning Methods
13.4.1 Bayesian Metric Learning Using the SigmoidFunction
13.4.2 Bayesian Neighbourhood Component Analysis
13.4.3 Local Distance Metric (LDM)
13.5 Information Theoretic Metric Learning
13.5.1 Information Theoretic Metric Learning with a Prior Weight Matrix
13.5.1.1 Offline Information Theoretic Metric Learning
13.5.1.2 Online Information Theoretic Metric Learning
13.5.2 Information Theoretic Metric Learning for Imbalanced Data
13.5.3 Probabilistic Relevant Component Analysis Methods
13.5.4 Metric Learning by Information Geometry
13.6 Empirical Risk Minimization in Metric Learning
13.6.1 Metric Learning Using the Sigmoid Function
13.6.2 Pairwise Constrained Component Analysis (PCCA)
13.6.3 Metric Learning for Privileged Information
13.7 Chapter Summary
References
14 Random Projection
14.1 Introduction
14.2 Linear Random Projection
14.3 The Johnson-Lindenstrauss Lemma
14.4 Sparse Linear Random Projection
14.5 Applications of Linear Random Projection
14.5.1 Low-Rank Matrix Approximation Using Random Projection
14.5.2 Approximate Nearest Neighbour Search
14.5.2.1 Random Projection onto a Hypercube
14.5.2.2 Approximate Nearest Neighbour Search by Random Projection
14.6 Random Fourier Features and Random Kitchen Sinks for Nonlinear Random Projection
14.6.1 Random Fourier Features for Learning with Approximate Kernels
14.6.2 Random Kitchen Sinks for Nonlinear Random Projection
14.7 Other Methods for Nonlinear Random Projection
14.7.1 Extreme Learning Machine
14.7.2 Randomly Weighted Neural Networks
14.7.2.1 Distance Preservation by Deterministic Layers
14.7.2.2 Distance Preservation by Random Layers
14.7.3 Ensemble of Random Projections
14.8 Chapter Summary
References
15 Sufficient Dimension Reduction and Kernel Dimension Reduction
15.1 Introduction
15.2 Preliminaries and Notations
15.3 Inverse Regression Methods
15.3.1 Sliced Inverse Regression (SIR)
15.3.2 Sliced Average Variance Estimation (SAVE)
15.3.3 Parametric Inverse Regression (PIR)
15.3.4 Contour Regression (CR)
15.3.5 Directional Regression (DR)
15.3.6 Likelihood-Based Methods
15.3.6.1 Principal Fitted Components (PFC)
15.3.6.2 Likelihood Acquired Direction (LAD)
15.3.7 Graphical Regression
15.4 Forward Regression Methods
15.4.1 Principal Hessian Directions (pHd)
15.4.2 Minimum Average Variance Estimation (MAVE)
15.4.3 Conditional Variance Estimation (CVE)
15.4.4 Deep Sufficient Dimension Reduction
15.4.4.1 Deep Variational Sufficient Dimension Reduction (DVSDR)
15.4.4.2 Meta-Learning for Sufficient Dimension Reduction
15.5 Kernel Dimension Reduction
15.5.1 Supervised Kernel Dimension Reduction
15.5.1.1 Supervised KDR by Projected Gradient Descent
15.5.1.2 Supervised KDR by Riemannian Optimization
15.5.1.3 Formulation of Supervised KDR by HSIC
15.5.2 Supervised KDR for Nonlinear Regression
15.5.3 Unsupervised Kernel Dimension Reduction
15.6 Chapter Summary
References
16 Stochastic Neighbour Embedding
16.1 Introduction
16.2 Stochastic Neighbour Embedding (SNE)
16.3 Symmetric Stochastic Neighbour Embedding
16.4 t-Distributed Stochastic Neighbour Embedding (t-SNE)
16.4.1 The Crowding Problem
16.4.2 t-SNE Formulation
16.4.3 Early Exaggeration
16.5 General Degrees of Freedom in t-SNE
16.6 Out-of-Sample Embedding
16.7 Accelerating SNE and t-SNE
16.7.1 Acceleration Using Out-of-Sample Embedding
16.7.2 Acceleration Using Random Walk
16.8 Recent Improvements of t-SNE
16.9 Chapter Summary
References
17 Uniform Manifold Approximation and Projection (UMAP)
17.1 Introduction
17.2 UMAP
17.2.1 Data Graph in the Input Space
17.2.2 Data Graph in the Embedding Space
17.2.3 Optimization Cost Function
17.2.4 The Training Algorithm of UMAP
17.2.5 Supervised and Semisupervised Embedding
17.3 Justifying UMAP\'s Cost Function by Algebraic Topology and Category Theory
17.4 Neighbour Embedding: Comparison with t-SNE and LargeVis
17.5 Discussion on Repulsive Forces and Negative Sampling in the UMAP\'s Cost Function
17.5.1 UMAP\'s Emphasis on Repulsive Forces
17.5.2 UMAP\'s Effective Cost Function
17.6 DensMAP for Density-Preserving Embedding
17.7 Parametric UMAP for Embedding by Deep Learning
17.8 Progressive UMAP for Streaming and Out-of-Sample Data
17.9 Chapter Summary
References
Part IV Neural Network-Based Dimensionality Reduction
18 Restricted Boltzmann Machine and Deep Belief Network
18.1 Introduction
18.2 Background
18.2.1 Probabilistic Graphical Model and Markov Random Field
18.2.2 Gibbs Sampling
18.2.3 Statistical Physics and Ising Model
18.2.3.1 Boltzmann (Gibbs) Distribution
18.2.3.2 Ising Model
18.2.4 Hopfield Network
18.3 Restricted Boltzmann Machine
18.3.1 Structure of Restricted Boltzmann Machine
18.3.2 Conditional Distributions
18.3.3 Sampling Hidden and Visible Variables
18.3.3.1 Gibbs Sampling
18.3.3.2 Generations and Evaluations by Gibbs Sampling
18.3.4 Training Restricted Boltzmann Machine by Maximum Likelihood Estimation
18.3.5 Contrastive Divergence
18.3.6 Boltzmann Machine
18.4 Distributions of Visible and Hidden Variables
18.4.1 Modelling with Exponential Family Distributions
18.4.2 Binary States
18.4.3 Continuous Values
18.4.4 Discrete Poisson States
18.5 Conditional Restricted Boltzmann Machine
18.6 Deep Belief Network
18.6.1 Stacking RBM Models
18.6.2 Other Improvements over RBM and DBN
18.7 Chapter Summary
References
19 Deep Metric Learning
19.1 Introduction
19.2 Reconstruction Autoencoders
19.2.1 Types of Autoencoders
19.2.2 Reconstruction Loss
19.2.3 Denoising Autoencoder
19.2.4 Metric Learning by Reconstruction Autoencoder
19.3 Supervised Metric Learning by Supervised Loss Functions
19.3.1 Mean Squared Error and Mean Absolute ValueLosses
19.3.2 Huber and KL-Divergence Loss
19.3.3 Hinge Loss
19.3.4 Cross-Entropy Loss
19.4 Metric Learning by Siamese Networks
19.4.1 Siamese and Triplet Networks
19.4.2 Pairs and Triplets of Data Points
19.4.3 Implementation of Siamese Networks
19.4.4 Contrastive Loss
19.4.4.1 Contrastive Loss
19.4.4.2 Generalized Contrastive Loss
19.4.5 Triplet Loss
19.4.6 Tuplet Loss
19.4.7 Neighbourhood Component Analysis Loss
19.4.8 Proxy Neighbourhood Component Analysis Loss
19.4.9 Softmax Triplet Loss
19.4.10 Triplet Global Loss
19.4.11 Angular Loss
19.4.12 SoftTriple Loss
19.4.13 Fisher Siamese Losses
19.4.14 Deep Adversarial Metric Learning
19.4.15 Triplet Mining
19.4.15.1 Batch-All
19.4.15.2 Batch-Hard
19.4.15.3 Batch-Semi-Hard
19.4.15.4 Easy-Positive
19.4.15.5 Lifted Embedding Loss
19.4.15.6 Hard Mining Center-Triplet Loss
19.4.15.7 Triplet Loss with Cross-Batch Memory
19.4.16 Triplet Sampling
19.4.16.1 Distance Weighted Sampling
19.4.16.2 Sampling by Bayesian Updating Theorem
19.4.16.3 Hard Negative Sampling
19.5 Deep Discriminant Analysis Metric Learning
19.5.1 Deep Probabilistic Discriminant Analysis
19.5.2 Discriminant Analysis with Virtual Samples
19.5.3 Deep Fisher Discriminant Analysis
19.6 Multimodal Deep Metric Learning
19.7 Geometric Metric Learning by Neural Network
19.8 Few-Shot Metric Learning
19.8.1 Multiscale Metric Learning
19.8.2 Metric Learning with Continuous Similarity Scores
19.9 Chapter Summary
References
20 Variational Autoencoders
20.1 Introduction
20.2 Parts of the Variational Autoencoder
20.2.1 Encoder of Variational Autoencoder
20.2.2 Sampling the Latent Variable
20.2.3 Decoder of Variational Autoencoder
20.3 Training Variational Autoencoder with Expectation Maximization
20.3.1 Simplification Type 1
20.3.2 Simplification Type 2
20.3.3 Simplification Type 2 for Special Case of Gaussian Distributions
20.3.4 Training Variational Autoencoder withApproximations
20.3.5 Prior Regularization
20.4 The Reparameterization Trick
20.5 Training Variational Autoencoder with Backpropagation
20.6 The Test Phase in the Variational Autoencoder
20.7 Other Notes and Other Variants of the Variational Autoencoder
20.8 Chapter Summary
Appendix 20.1: Proof for Lemma 20.1
References
21 Adversarial Autoencoders
21.1 Introduction
21.2 Generative Adversarial Network (GAN)
21.2.1 Adversarial Learning: The Adversarial Game
21.2.2 Optimization and Loss Function
21.2.3 Network Structure of GAN
21.2.4 Optimal Solution of GAN
21.2.5 Convergence and Equilibrium Analysis of GAN
21.3 Mode Collapse Problem in GAN
21.4 Autoencoders Based on Adversarial Learning
21.4.1 Adversarial Autoencoder (AAE)
21.4.1.1 Unsupervised AAE
21.4.1.2 Sampling the Latent Variable
21.4.1.3 Supervised AAE
21.4.1.4 Semisupervised AAE
21.4.1.5 Unsupervised Clustering with AAE
21.4.1.6 Dimensionality Reduction with AAE
21.4.2 PixelGAN Autoencoder
21.4.3 Implicit Autoencoder (IAE)
21.5 Chapter Summary
References
Index