توضیحاتی در مورد کتاب HDInsight Essentials - Second Edition
نام کتاب : HDInsight Essentials - Second Edition
ویرایش : 2
عنوان ترجمه شده به فارسی : HDInsight Essentials - چاپ دوم
سری :
نویسندگان : Rajesh Nadipalli
ناشر : Packt Publishing
سال نشر : 2015
تعداد صفحات : 179
ISBN (شابک) : 9781784399429 , 1784399426
زبان کتاب : English
فرمت کتاب : pdf
حجم کتاب : 9 مگابایت
بعد از تکمیل فرایند پرداخت لینک دانلود کتاب ارائه خواهد شد. درصورت ثبت نام و ورود به حساب کاربری خود قادر خواهید بود لیست کتاب های خریداری شده را مشاهده فرمایید.
فهرست مطالب :
Cover\nCopyright\nCredits\nAbout the Author\nAbout the Reviewers\nwww.PacktPub.com\nTable of Contents\nPreface\nChapter 1: Hadoop and HDInsight \rin a Heartbeat\n Data is everywhere\n Business value of big data\n Hadoop concepts\n Brief history of Hadoop\n Core components\n Hadoop cluster layout\n HDFS overview\n Writing a file to HDFS\n Reading a file from HDFS\n HDFS basic commands\n YARN overview\n YARN application life cycle\n YARN workloads\n Hadoop distributions\n HDInsight overview\n HDInsight and Hadoop relationship\n Hadoop on Windows deployment options\n Microsoft Azure HDInsight Service\n HDInsight Emulator\n Hortonworks Data Platform (HDP) for Windows\n Summary\nChapter 2: Enterprise Data Lake \rusing HDInsight\n Enterprise Data Warehouse architecture\n Source systems\n Data warehouse\n Storage\n Processing\n User access\n Provisioning and monitoring\n Data governance and security\n Pain points of EDW\n The next generation Hadoop-based Enterprise data architecture\n Source systems\n Data Lake\n Storage\n Processing\n User access\n Provisioning and monitoring\n Data governance, security, and metadata\n Journey to your Data Lake dream\n Ingestion and organization\n Transformation (rules driven)\n Access, analyze, and report\n Tools and technology for Hadoop ecosystem\n Use case powered by Microsoft HDInsight\n Problem statement\n Solution\n Source systems\n Storage\n Processing\n User access\n Benefits\n Summary\nChapter 3: HDInsight Service on Azure\n Registering for an Azure account\n Azure storage\n Provisioning an HDInsight cluster\n Cluster topology\n Provisioning using Azure Powershell\n Creating a storage container\n Provisioning a new HDInsight cluster\n HDInsight management dashboard\n Dashboard\n Monitor\n Configuration\n Exploring clusters using the remote desktop\n Running a sample MapReduce\n Deleting the cluster\n HDInsight Emulator for the development\n Installing HDInsight Emulator\n Installation verification\n Using HDInsight Emulator\n Summary\nChapter 4: Administering Your \rHDInsight Cluster\n Monitoring cluster health\n Name Node status\n The Name Node Overview page\n Datanode Status\n Utilities and logs\n Hadoop Service Availability\n YARN Application Status\n Azure storage management\n Configuring your storage account\n Monitoring your storage account\n Managing access keys\n Deleting your storage account\n Azure Powershell\n Access Azure Blob storage using Azure Powershell\n Summary\nChapter 5: Ingest and Organize \rData Lake\n End-to-end Data Lake solution\n Ingesting to Data Lake using HDFS command\n Connecting to a Hadoop client\n Getting your files on the local storage\n Transferring to HDFS\n Loading data to Azure Blob storage using Azure PowerShell\n Loading files to Data Lake using GUI tools\n Storage access keys\n Storage tools\n CloudXplorer\n Key benefits\n Registering your storage account\n Uploading files to your Blob storage\n Using Sqoop to move data from RDBMS to Data Lake\n Key benefits\n Two modes of using Sqoop\n Using Sqoop to import data (SQL to Hadoop)\n Organizing your Data Lake in HDFS\n Managing file metadata using HCatalog\n Key benefits\n Using HCatalog Command Line to create tables\n Summary\nChapter 6: Transform Data in the \rData Lake\n Transformation overview\n Tools for transforming data in Data Lake\n HCatalog\n Persisting HCatalog metastore in a SQL database\n Apache Hive\n Hive architecture\n Starting Hive in HDInsight\n Basic Hive commands\n Apache Pig\n Pig architecture\n Starting Pig in HDInsight node\n Basic Pig commands\n Pig or Hive\n MapReduce\n The mapper code\n The reducer code\n The driver code\n Executing MapReduce on HDInsight\n Azure Powershell for execution of Hadoop jobs\n Transformation for the OTP project\n Cleaning data using Pig\n Executing Pig script\n Registering a refined and aggregate table using Hive\n Executing Hive script\n Reviewing results\n Other tools used for transformation\n Oozie\n Spark\n Summary\nChapter 7: Analyze and Report from Data Lake\n Data access overview\n Analysis using Excel and Microsoft Hive ODBC driver\n Prerequisites\n Step 1 – installing the Microsoft Hive ODBC driver\n Step 2 – creating Hive ODBC Data Source\n Step 3 – importing data to Excel\n Analysis using Excel Power Query\n Prerequisites\n Step 1 – installing the Microsoft Power Query for Excel\n Step 2 – importing Azure Blob storage data into Excel\n Step 3 – analyzing data using Excel\n Other BI features in Excel\n PowerPivot\n Power View and Power Map\n Step 1 – importing Azure Blob storage data into Excel\n Step 2 – launch map view\n Step 3 – configure the map\n Power BI Catalog\n Ad hoc analysis using Hive\n Other alternatives for analysis\n RHadoop\n Apache Giraph\n Apache Mahout\n Azure Machine Learning\n Summary\nChapter 8: HDInsight 3.1 New Features\n HBase\n HBase positioning in Data Lake and use cases\n Provisioning HDInsight HBase cluster\n Creating a sample HBase schema\n Designing the airline on-time performance table\n Connecting to HBase using the HBase shell\n Creating an HBase table\n Loading data to the HBase table\n Querying data from the HBase table\n HBase additional information\n Storm\n Storm positioning in Data Lake\n Storm key concepts\n Provisioning HDInsight Storm cluster\n Running a sample Storm topology\n Connecting to Storm using Storm shell\n Running the Storm Wordcount topology\n Monitoring status of the Wordcount topology\n Additional information on Storm\n Apache Tez\n Summary\nChapter 9: Strategy for a Successful Data Lake Implementation\n Challenges on building a production Data Lake\n The success path for a production Data Lake\n Identifying the big data problem\n Proof of technology for Data Lake\n Form a Data Lake Center of Excellence\n Executive sponsors\n Data Lake consumers\n Development\n Operations and infrastructure\n Architectural considerations\n Extensible and modular\n Metadata-driven solution\n Integration strategy\n Security\n Online resources\n Summary\nIndex