MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems (Paperback)

Donald Miner, Adam Shook

買這商品的人也買了...

商品描述

Design patterns for the MapReduce framework, until now, have been scattered among various research papers, blogs, and books. This handy guide brings together a unique collection of valuable MapReduce patterns that will save you time and effort regardless of the domain, language, or development framework you’re using.

Each pattern is explained in context, with pitfalls and caveats clearly identified—so you can avoid some of the common design mistakes when modeling your Big Data architecture. This book also provides a complete overview of MapReduce that explains its origins and implementations, and why design patterns are so important.

Hadoop MapReduce code is provided to help you learn how to apply the design patterns by example.

Topics include:

  • Basic patterns, including map-only filter, group by, aggregation, distinct, and limit
  • Joins: traditional reduce-side join, reduce-side join with Bloom filter, replicated join with distributed cache, merge join, Cartesian products, and intersections
  • Binning, sharding for other systems, sorting, sampling, unions, and other patterns for organizing data
  • Job optimization patterns, including multi-job map-only job folding, and overloading the key grouping to perform two jobs at once

商品描述(中文翻譯)

MapReduce框架的設計模式,直到現在為止,一直分散在各種研究論文、部落格和書籍中。這本實用指南匯集了一系列寶貴的MapReduce設計模式,無論您使用的領域、語言或開發框架是什麼,都能為您節省時間和精力。

每個模式都在相應的情境下進行解釋,清楚標識出可能遇到的問題和注意事項,這樣您就可以避免在建模大數據架構時犯一些常見的設計錯誤。本書還提供了MapReduce的完整概述,解釋了它的起源和實現方式,以及設計模式的重要性。

書中提供了Hadoop MapReduce代碼,以幫助您通過實例學習如何應用這些設計模式。

主題包括:
- 基本模式,包括僅映射過濾器、分組、聚合、去重和限制
- 連接:傳統的減少端連接、帶有Bloom過濾器的減少端連接、使用分佈式緩存的複製連接、合併連接、笛卡爾乘積和交集
- 分組、分片和排序等其他系統的分組、分片、排序、抽樣、聯合和其他組織數據的模式
- 作業優化模式,包括多作業僅映射作業折疊和過載鍵分組以同時執行兩個作業