توضیحاتی در مورد کتاب The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses on Azure with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake
نام کتاب : The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses on Azure with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake
عنوان ترجمه شده به فارسی : مجموعه ابزار Azure Data Lakehouse: ساخت و مقیاسبندی Lakehouseهای داده در Azure با Delta Lake، Apache Spark، Databricks، Synapse Analytics و Snowflake
سری :
نویسندگان : Ron L'Esteve
ناشر : Apress
سال نشر : 2022
تعداد صفحات : 467
ISBN (شابک) : 1484282329 , 9781484282328
زبان کتاب : English
فرمت کتاب : pdf
حجم کتاب : 26 مگابایت
بعد از تکمیل فرایند پرداخت لینک دانلود کتاب ارائه خواهد شد. درصورت ثبت نام و ورود به حساب کاربری خود قادر خواهید بود لیست کتاب های خریداری شده را مشاهده فرمایید.
فهرست مطالب :
Table of Contents
About the Author
About the Technical Reviewer
Acknowledgments
Introduction
Part I: Getting Started
Chapter 1: The Data Lakehouse Paradigm
Background
Architecture
Ingestion and Processing
Data Factory
Databricks
Functions and Logic Apps
Synapse Analytics Serverless Pools
Stream Analytics
Messaging Hubs
Storing and Serving
Delta Lake
Synapse Analytics Dedicated SQL Pools
Relational Database
Purchasing Models (SQL DTU vs. vCore Database)
Service Tiers
Deployment Models
Non-relational Databases
Snowflake
Consumption
Analysis Services
Power BI
Power Apps
Advanced Analytics
Cognitive Services
Machine Learning
Continuous Integration, Deployment, and Governance
DevOps
Purview
Summary
Part II: Data Platforms
Chapter 2: Snowflake
Architecture
Cost
Security
Azure Key Vault
Azure Private Link
Applications
Replication and Failover
Data Integration with Azure
Data Lake Storage Gen2
Real-Time Data Loading with ADLS gen2
Data Factory
Databricks
Data Transformation
Governance
Column-Level Security
Row-Level Security
Access History
Object Tagging
Sharing
Direct Share
Data Marketplace
Data Exchange
Continuous Integration and Deployment
Jenkins
Azure DevOps
Reporting
Power BI
Delta Lake, Machine Learning, and Constraints
Delta Lake
Machine Learning
Constraints
Summary
Chapter 3: Databricks
Workspaces
Data Science and Engineering
Machine Learning
SQL
Compute
Storage
Mount Data Lake Storage Gen2 Account
Getting Started
Create a Secret Scope
Mount Data Lake Storage Gen2
Read Data Lake Storage Gen2 from Databricks
Delta Lake
Reporting
Real-Time Analytics
Advanced Analytics
Security and Governance
Continuous Integration and Deployment
Integration with Synapse Analytics
Dynamic Data Encryption
Data Profile
Query Profile
Constraints
Identity
Delta Live Tables Merge
Summary
Chapter 4: Synapse Analytics
Workspaces
Storage
SQL Database (SQL Pools)
Lake Database
Integration Dataset
External Datasets
Development
Integration
Monitoring
Management
Reporting
Continuous Integration and Deployment
Real-Time Analytics
Structured Streaming
Synapse Link
Advanced Analytics
Security
Governance
Additional Features
Delta Tables
Machine Learning
SQL Server Integration Services Integration Runtime (SSIS IR)
Map Data Tool
Data Sharing
SQL Incremental
Constraints
Summary
Part III: Apache Spark ELT
Chapter 5: Pipelines and Jobs
Databricks
Data Factory
Mapping Data Flows
HDInsight Spark Activity
Scheduling and Monitoring
Synapse Analytics Workspace
Summary
Chapter 6: Notebook Code
PySpark
Excel
XML
JSON
ZIP
Scala
SQL
Optimizing Performance
Summary
Part IV: Delta Lake
Chapter 7: Schema Evolution
Schema Evolution Using Parquet Format
Schema Evolution Using Delta Format
Append
Overwrite
Summary
Chapter 8: Change Data Feed
Create Database and Tables
Insert Data into Tables
Change Data Capture
Streaming Changes
Summary
Chapter 9: Clones
Shallow Clones
Deep Clones
Summary
Chapter 10: Live Tables
Advantages of Delta Live Tables
Create a Notebook
Create and Run a Pipeline
Schedule a Pipeline
Explore Event Logs
Summary
Chapter 11: Sharing
Architecture
Share Data
Access Data
Sharing Data with Snowflake
Summary
Part V: Optimizing Performance
Chapter 12: Dynamic Partition Pruning
Partitions
Prerequisites
DPP Commands
Create Cluster
Create Notebook and Mount Data Lake
Create Fact Table
Verify Fact Table Partitions
Create Dimension Table
Join Results Without DPP Filter
Join Results with DPP Filter
Summary
Chapter 13: Z-Ordering and Data Skipping
Prepare Data in Delta Lake
Verify Data in Delta Lake
Create Hive Table
Run Optimize and Z-Order Commands
Verify Data Skipping
Summary
Chapter 14: Adaptive Query Execution
How It Works
Prerequisites
Comparing AQE Performance on Query with Joins
Create Datasets
Disable AQE
Enable AQE
Summary
Chapter 15: Bloom Filter Index
How a Bloom Filter Index Works
Create a Cluster
Create a Notebook and Insert Data
Enable Bloom Filter Index
Create Tables
Create a Bloom Filter Index
Optimize Table with Z-Order
Verify Performance Improvements
Summary
Chapter 16: Hyperspace
Prerequisites
Create Parquet Files
Run a Query Without an Index
Import Hyperspace
Read the Parquet Files to a Data Frame
Create a Hyperspace Index
Rerun the Query with Hyperspace Index
Other Hyperspace Management APIs
Summary
Part VI: Advanced Capabilities
Chapter 17: Auto Loader
Advanced Schema Evolution
Prerequisites
Generate Data from SQL Database
Load Data to Azure Data Lake Storage Gen2
Configure Resources in Azure Portal
Configure Databricks
Run Auto Loader in Databricks
Configuration Properties
Rescue Data
Schema Hints
Infer Column Types
Add New Columns
Managing Auto Loader Resources
Read a Stream
Write a Stream
Explore Results
Summary
Chapter 18: Python Wheels
Install Application Software
Install Visual Studio Code and Python Extension
Install Python
Configure Python Interpreter Path for Visual Studio Code
Verify Python Version in Visual Studio Code Terminal
Set Up Wheel Directory Folders and Files
Create Setup File
Create Readme File
Create License File
Create Init File
Create Package Function File
Install Python Wheel Packages
Install Wheel Package
Install Check Wheel Package
Create and Verify Wheel File
Create Wheel File
Check Wheel Contents
Verify Wheel File
Configure Databricks Environment
Install Wheel to Databricks Library
Create Databricks Notebook
Mount Data Lake Folder
Create Spark Database
Verify Wheel Package
Import Wheel Package
Create Function Parameters
Run Wheel Package Function
Show Spark Tables
Files in Databricks Repos
Continuous Integration and Deployment
Summary
Chapter 19: Security and Controls
Implement Cluster, Pool, and Jobs Access Control
Implement Workspace Access Control
Implement Other Access and Visibility Controls
Table Access Control
Personal Access Tokens
Visibility Controls
Example Row-Level Security Implementation
Create New User Groups
Load Sample Data
Create Delta Tables
Run Queries Using Row-Level Security
Create Row-Level Secured Views and Grant Selective User Access
Interaction with Azure Active Directory
Summary
Index