Data Engineering with Apache Spark, Delta Lake, and Lakehouse: Create scalable pipelines that ingest, curate, and aggregate complex data in a timely and secure way

دانلود کتاب Data Engineering with Apache Spark, Delta Lake, and Lakehouse: Create scalable pipelines that ingest, curate, and aggregate complex data in a timely and secure way

56000 تومان موجود

کتاب مهندسی داده با آپاچی اسپارک، دلتا لیک و لیک‌هاوس: خطوط لوله مقیاس‌پذیری ایجاد کنید که داده‌های پیچیده را به‌موقع و به‌موقع جمع‌آوری می‌کند. نسخه زبان اصلی

دانلود کتاب مهندسی داده با آپاچی اسپارک، دلتا لیک و لیک‌هاوس: خطوط لوله مقیاس‌پذیری ایجاد کنید که داده‌های پیچیده را به‌موقع و به‌موقع جمع‌آوری می‌کند. بعد از پرداخت مقدور خواهد بود
توضیحات کتاب در بخش جزئیات آمده است و می توانید موارد را مشاهده فرمایید


این کتاب نسخه اصلی می باشد و به زبان فارسی نیست.


امتیاز شما به این کتاب (حداقل 1 و حداکثر 5):

امتیاز کاربران به این کتاب:        تعداد رای دهنده ها: 3


توضیحاتی در مورد کتاب Data Engineering with Apache Spark, Delta Lake, and Lakehouse: Create scalable pipelines that ingest, curate, and aggregate complex data in a timely and secure way

نام کتاب : Data Engineering with Apache Spark, Delta Lake, and Lakehouse: Create scalable pipelines that ingest, curate, and aggregate complex data in a timely and secure way
عنوان ترجمه شده به فارسی : مهندسی داده با آپاچی اسپارک، دلتا لیک و لیک‌هاوس: خطوط لوله مقیاس‌پذیری ایجاد کنید که داده‌های پیچیده را به‌موقع و به‌موقع جمع‌آوری می‌کند.
سری :
نویسندگان :
ناشر : Packt Publishing
سال نشر : 2021
تعداد صفحات : 480
ISBN (شابک) : 1801077746 , 9781801077743
زبان کتاب : English
فرمت کتاب : pdf
حجم کتاب : 16 مگابایت



بعد از تکمیل فرایند پرداخت لینک دانلود کتاب ارائه خواهد شد. درصورت ثبت نام و ورود به حساب کاربری خود قادر خواهید بود لیست کتاب های خریداری شده را مشاهده فرمایید.


فهرست مطالب :


Cover
Title Page
Copyright and Credits
Dedication
Foreword
Contributors
Table of Contents
Preface
Section 1: Modern Data Engineering and Tools
Chapter 1: The Story of Data Engineering and Analytics
The journey of data
Exploring the evolution of data analytics
Core capabilities of storage and compute resources
Availability of varying datasets
The paradigm shift to distributed computing
Adoption of cloud computing
Data storytelling
The monetary power of data
Organic growth
Summary
Chapter 2: Discovering Storage and Compute Data Lakes
Introducing data lakes
Exploring the benefits of data lakes
Adhering to compliance frameworks
Segregating storage and compute in a data lake
Discovering data lake architectures
The CAP theorem
Summary
Chapter 3: Data Engineering on Microsoft Azure
Introducing data engineering in Azure
Performing data engineering in Microsoft Azure
Self-managed data engineering services (IaaS)
Azure-managed data engineering services (PaaS)
Data processing services in Microsoft Azure
Data engineering as a service (SaaS)
Data cataloging and sharing services in Microsoft Azure
Opening a free account with Microsoft Azure
Summary
Section 2: Data Pipelines and Stages of Data Engineering
Chapter 4: Understanding Data Pipelines
Exploring data pipelines
Components of a data pipeline
Process of creating a data pipeline
Discovery phase
Design phase
Development phase
Deployment phase
Running a data pipeline
Sample lakehouse project
Summary
Chapter 5: Data Collection Stage – The Bronze Layer
Architecting the Electroniz data lake
The cloud architecture
The pipeline design
The deployment strategy
Understanding the bronze layer
Configuring data sources
Data preparation
Configuring data destinations
Building the ingestion pipelines
Building a batch ingestion pipeline
Testing the ingestion pipelines
Building the streaming ingestion pipeline
Summary
Chapter 6: Understanding Delta Lake
Understanding how Delta Lake enables the lakehouse
Understanding Delta Lake
Preparing Azure resources
Creating a Delta Lake table
Changing data in an existing Delta Lake table
Performing time travel
Performing upserts of data
Understanding isolation levels
Understanding concurrency control
Cleaning up Azure resources
Summary
Chapter 7: Data Curation Stage – The Silver Layer
The need for curating raw data
Unstandardized data
Invalid data
Non-uniform data
Inconsistent data
Duplicate data
Insecure data
The process of curating raw data
Inspecting data
Getting approval
Cleaning data
Verifying data
Developing a data curation pipeline
Preparing Azure resources
Creating the pipeline for the silver layer
Running the pipeline for the silver layer
Verifying curated data in the silver layer
Verifying unstandardized data
Verifying invalid data
Verifying non-uniform data
Verifying duplicate data
Verifying insecure data
Cleaning up Azure resources
Summary
Chapter 8: Data Aggregation Stage – The Gold Layer
The need to aggregate data
The process of aggregating data
Developing a data aggregation pipeline
Preparing the Azure resources
Creating the pipeline for the gold layer
Running the aggregation pipeline
Understanding data consumption
Accessing silver layer data
Accessing gold layer data
Verifying aggregated data in the gold layer
Meeting customer expectations
Summary
Section 3: Data Engineering Challenges and Effective Deployment Strategies
Chapter 9: Deploying and Monitoring Pipelines in Production
The deployment strategy
Developing the master pipeline
Testing the master pipeline
Scheduling the master pipeline
Monitoring pipelines
Adding durability features
Dealing with failure conditions
Adding alerting features
Summary
Chapter 10: Solving Data Engineering Challenges
Schema evolution
Sharing data
Preparing the Azure resources
Creating a data share
Data governance
Preparing the Azure resources
Creating a data catalog
Cleaning up Azure resources
Summary
Chapter 11: Infrastructure Provisioning
Infrastructure as code
Deploying infrastructure using Azure Resource Manager
Creating ARM templates
Deploying ARM templates using the Azure portal
Deploying ARM templates using the Azure CLI
Deploying ARM templates containing secrets
Deploying multiple environments using IaC
Cleaning up Azure resources
Summary
Chapter 12: Continuous Integration and Deployment (CI/CD) of Data Pipelines
Understanding CI/CD
Traditional software delivery cycle
Modern software delivery cycle
Designing CI/CD pipelines
Developing CI/CD pipelines
Creating an Azure DevOps organization
Creating the Electroniz infrastructure CI/CD pipeline
Creating the Electroniz code CI/CD pipeline
Creating the CI/CD life cycle
Summary
About Packt
Other Books You May Enjoy
Index




پست ها تصادفی