MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems (Paperback)

Donald Miner, Adam Shook



Design patterns for the MapReduce framework, until now, have been scattered among various research papers, blogs, and books. This handy guide brings together a unique collection of valuable MapReduce patterns that will save you time and effort regardless of the domain, language, or development framework you’re using.

Each pattern is explained in context, with pitfalls and caveats clearly identified—so you can avoid some of the common design mistakes when modeling your Big Data architecture. This book also provides a complete overview of MapReduce that explains its origins and implementations, and why design patterns are so important.

Hadoop MapReduce code is provided to help you learn how to apply the design patterns by example.

Topics include:

  • Basic patterns, including map-only filter, group by, aggregation, distinct, and limit
  • Joins: traditional reduce-side join, reduce-side join with Bloom filter, replicated join with distributed cache, merge join, Cartesian products, and intersections
  • Binning, sharding for other systems, sorting, sampling, unions, and other patterns for organizing data
  • Job optimization patterns, including multi-job map-only job folding, and overloading the key grouping to perform two jobs at once




書中提供了Hadoop MapReduce代碼,以幫助您通過實例學習如何應用這些設計模式。

- 基本模式,包括僅映射過濾器、分組、聚合、去重和限制
- 連接:傳統的減少端連接、帶有Bloom過濾器的減少端連接、使用分佈式緩存的複製連接、合併連接、笛卡爾乘積和交集
- 分組、分片和排序等其他系統的分組、分片、排序、抽樣、聯合和其他組織數據的模式
- 作業優化模式,包括多作業僅映射作業折疊和過載鍵分組以同時執行兩個作業