Hadoop Application Architectures (Paperback)

Mark Grover, Ted Malaska, Jonathan Seidman, Gwen Shapira

  • 出版商: O'Reilly
  • 出版日期: 2015-08-18
  • 售價: $2,070
  • 貴賓價: 9.5$1,967
  • 語言: 英文
  • 頁數: 400
  • 裝訂: Paperback
  • ISBN: 1491900083
  • ISBN-13: 9781491900086
  • 相關分類: Hadoop
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

商品描述

Get expert guidance on architecting end-to-end data management solutions with Apache Hadoop. While many sources explain how to use various components in the Hadoop ecosystem, this practical book takes you through architectural considerations necessary to tie those components together into a complete tailored application, based on your particular use case.

To reinforce those lessons, the book’s second section provides detailed examples of architectures used in some of the most commonly found Hadoop applications. Whether you’re designing a new Hadoop application, or planning to integrate Hadoop into your existing data infrastructure, Hadoop Application Architectures will skillfully guide you through the process.

This book covers:

  • Factors to consider when using Hadoop to store and model data
  • Best practices for moving data in and out of the system
  • Data processing frameworks, including MapReduce, Spark, and Hive
  • Common Hadoop processing patterns, such as removing duplicate records and using windowing analytics
  • Giraph, GraphX, and other tools for large graph processing on Hadoop
  • Using workflow orchestration and scheduling tools such as Apache Oozie
  • Near-real-time stream processing with Apache Storm, Apache Spark Streaming, and Apache Flume
  • Architecture examples for clickstream analysis, fraud detection, and data warehousing

商品描述(中文翻譯)

獲得有關使用Apache Hadoop架構端到端數據管理解決方案的專家指導。儘管許多資料來源解釋了如何使用Hadoop生態系統中的各個組件,但這本實用書籍將帶您深入了解必要的架構考慮因素,以將這些組件結合成完整的定制應用程序,根據您的特定用例。

為了加強這些教訓,本書的第二部分提供了一些最常見的Hadoop應用程序中使用的架構示例。無論您是設計新的Hadoop應用程序,還是計劃將Hadoop集成到現有的數據基礎設施中,Hadoop應用程序架構將熟練地引導您完成整個過程。

本書涵蓋了以下內容:
- 在使用Hadoop存儲和建模數據時需要考慮的因素
- 在系統中移動數據的最佳實踐
- 數據處理框架,包括MapReduce、Spark和Hive
- 常見的Hadoop處理模式,例如去除重複記錄和使用窗口分析
- 用於在Hadoop上進行大型圖形處理的Giraph、GraphX和其他工具
- 使用工作流程編排和調度工具,如Apache Oozie
- 使用Apache Storm、Apache Spark Streaming和Apache Flume進行近實時流處理
- 用於點擊流分析、欺詐檢測和數據倉儲的架構示例