Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours (Paperback)

Manpreet Singh, Arshad Ali

  • 出版商: SAMS
  • 出版日期: 2019-12-01
  • 售價: $1,650
  • 貴賓價: 9.5$1,568
  • 語言: 英文
  • 頁數: 592
  • 裝訂: Paperback
  • ISBN: 0672337274
  • ISBN-13: 9780672337277
  • 相關分類: 大數據 Big-dataData Science
  • 立即出貨 (庫存=1)



With Microsoft HDInsight, business professionals and data analysts can rapidly leverage the power of Hadoop on a flexible, scalable cloud-based platform, using Microsoft's accessible business intelligence, visualization, and productivity tools. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to provision, configure, monitor, troubleshoot, and use HDInsight, even if you're new to big data analytics. Each short, easy lesson builds on all that's come before: you'll learn all of HDInsight's essentials as you solve real data analytics problems. Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours covers all this, and much more:
  • Introduction of Big Data, NoSQL systems, its Business Value Proposition and use cases examples
  • Introduction to Hadoop, Architecture, Ecosystem and Microsoft HDInsight
  • Getting to know Hadoop 2.0 and the innovations it provides like HDFS2 and YARN
  • Quickly installing, configuring, and monitoring Hadoop (HDInsight) clusters in the cloud and automating cluster provisioning
  • Customize the HDInsight cluster and install additional Hadoop ecosystem projects using Script Actions
  • Administering HDInsight from the Hadoop command prompt or Microsoft PowerShell
  • Using the Microsoft Azure HDInsight Emulator for learning or development
  • Understanding HDFS, HDFS vs. Azure Blob Storage, MapReduce Job Framework and Job Execution Pipeline
  • Doing big data analytics with MapReduce, writing your MapReduce programs in your choice of .NET programming language such as C#
  • Using Hive for big data analytics, demonstrate end to end scenario and how Apache Tez improves the performance several folds
  • Consuming HDInsight data from Microsoft BI Tools over Hive ODBC Driver - Using HDInsight with Microsoft BI and Power BI to simplify data integration, analysis, and reporting
  • Using PIG for big data transformation workflows step by step
  • Apache HBase on HDInsight, its architecture, data model, HBase vs. Hive, programmatically managing HBase data with C# and Apache Phoenix
  • Using Sqoop or SSIS (SQL Server Integration Services) to move data to/from HDInsight and build data integration workflows for transferring data
  • Using Oozie for scheduling, co-ordination and managing data processing workflows in HDInsight cluster
  • Using R programming language with HDInsight for performing statistical computing on Big Data sets
  • Using Apache Spark's in-memory computation model to run big data analytics up to 100 times faster than Hadoop MapReduce
  • Perform real-time Stream Analytics on high-velocity big data streams with Storm
  • Integration of Enterprise Data Warehouse with Hadoop and Microsoft Analytics Platform System (APS), formally known as SQL Server Parallel Data Warehouse (PDW)
Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid problems. By the time you're finished, you'll be comfortable going beyond the book to create any HDInsight app you can imagine!


使用 Microsoft HDInsight,商業專業人士和數據分析師可以快速利用 Hadoop 的強大功能,在靈活、可擴展的基於雲的平台上使用 Microsoft 的易於使用的商業智能、可視化和生產力工具。現在,只需 24 個小時或更少的 24 堂課,即可學習所有您需要的技能和技巧,以配置、監控、疑難排解和使用 HDInsight,即使您是新手也沒問題。每個簡短、簡單的課程都建立在之前的基礎上:在解決真實的數據分析問題時,您將學習所有 HDInsight 的基本知識。《Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours》涵蓋了所有這些,以及更多內容:

- 大數據、NoSQL 系統的介紹,其商業價值主張和使用案例示例
- Hadoop、架構、生態系統和 Microsoft HDInsight 的介紹
- 了解 Hadoop 2.0 及其提供的創新,如 HDFS2 和 YARN
- 快速在雲中安裝、配置和監控 Hadoop(HDInsight)集群,並自動化集群配置
- 使用腳本操作自定義 HDInsight 集群並安裝其他 Hadoop 生態系統項目
- 從 Hadoop 命令提示符或 Microsoft PowerShell 管理 HDInsight
- 使用 Microsoft Azure HDInsight 模擬器進行學習或開發
- 了解 HDFS、HDFS vs. Azure Blob Storage、MapReduce 作業框架和作業執行流程
- 使用 MapReduce 進行大數據分析,在您選擇的 .NET 程序語言(如 C#)中編寫 MapReduce 程序
- 使用 Hive 進行大數據分析,演示端到端場景以及 Apache Tez 如何提高性能
- 通過 Hive ODBC 驅動程序從 Microsoft BI 工具消耗 HDInsight 數據 - 使用 HDInsight 與 Microsoft BI 和 Power BI 簡化數據集成、分析和報告
- 逐步介紹使用 PIG 進行大數據轉換工作流程
- HDInsight 上的 Apache HBase,其架構、數據模型,HBase vs. Hive,使用 C# 和 Apache Phoenix 編程管理 HBase 數據
- 使用 Sqoop 或 SSIS(SQL Server Integration Services)將數據移動到/從 HDInsight,並構建數據集成工作流程以進行數據傳輸
- 使用 Oozie 進行計劃、協調和管理 HDInsight 集群中的數據處理工作流程
- 使用 R 程序語言與 HDInsight 進行大數據集的統計計算
- 使用 Apache Spark 的內存計算模型運行大數據分析,速度比 Hadoop MapReduce 快 100 倍
- 在高速大數據流上進行實時流分析
- 將企業數據倉庫與 Hadoop 和 Microsoft Analytics Platform System(APS)集成,以前稱為 SQL Server Parallel Data Warehouse(PDW)

逐步指導將引導您解決常見問題、問題和任務;問答、測驗和練習將建立和測試您的知識;「您知道嗎?」提示提供內部建議和快捷方式;「注意!」警示幫助您避免問題。完成後,您將能夠輕鬆超越書本,創建任何您可以想像的 HDInsight 應用程式!