Optimizing Hadoop for MapReduce

Khaled Tannir

  • 出版商: Packt Publishing
  • 出版日期: 2014-02-21
  • 售價: $1,520
  • 貴賓價: 9.5$1,444
  • 語言: 英文
  • 頁數: 120
  • 裝訂: Paperback
  • ISBN: 1783285656
  • ISBN-13: 9781783285655
  • 相關分類: Hadoop分散式架構
  • 下單後立即進貨 (約3~4週)

商品描述

This book is the perfect introduction to sophisticated concepts in MapReduce and will ensure you have the knowledge to optimize job performance. This is not an academic treatise; it's an example-driven tutorial for the real world.

Overview

  • Optimize your MapReduce job performance
  • Identify your Hadoop cluster's weaknesses
  • Tune your MapReduce configuration

In Detail

MapReduce is the distribution system that the Hadoop MapReduce engine uses to distribute work around a cluster by working parallel on smaller data sets. It is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning, and statistical machine translation.

This book introduces you to advanced MapReduce concepts and teaches you everything from identifying the factors that affect MapReduce job performance to tuning the MapReduce configuration. Based on real-world experience, this book will help you to fully utilize your cluster's node resources to run MapReduce jobs optimally.

This book details the Hadoop MapReduce job performance optimization process. Through a number of clear and practical steps, it will help you to fully utilize your cluster's node resources.

Starting with how MapReduce works and the factors that affect MapReduce performance, you will be given an overview of Hadoop metrics and several performance monitoring tools. Further on, you will explore performance counters that help you identify resource bottlenecks, check cluster health, and size your Hadoop cluster. You will also learn about optimizing map and reduce tasks by using Combiners and compression.

The book ends with best practices and recommendations on how to use your Hadoop cluster optimally.

What you will learn from this book

  • Learn about the factors that affect MapReduce performance
  • Utilize the Hadoop MapReduce performance counters to identify resource bottlenecks
  • Size your Hadoop cluster's nodes
  • Set the number of mappers and reducers correctly
  • Optimize mapper and reducer task throughput and code size using compression and Combiners
  • Understand the various tuning properties and best practices to optimize clusters

Approach

This book is an example-based tutorial that deals with optimizing MapReduce job performance.

Who this book is written for

If you are a Hadoop administrator, developer, MapReduce user, or beginner, this book is the best choice available if you wish to optimize your clusters and applications. Having prior knowledge of creating MapReduce applications is not necessary, but will help you better understand the concepts and snippets of MapReduce class template code.

商品描述(中文翻譯)

這本書是關於MapReduce的精緻概念的完美入門,將確保您具備優化工作表現的知識。這不是一本學術論文,而是一本以實例為驅動的現實世界教程。

概述:
- 優化MapReduce工作表現
- 辨識Hadoop集群的弱點
- 調整MapReduce配置

詳細內容:
MapReduce是Hadoop MapReduce引擎使用的分佈式系統,通過在集群上並行處理較小的數據集來分配工作。它在各種應用中非常有用,包括分佈式基於模式的搜索、分佈式排序、網頁連結圖反轉、每個主機的詞向量、網頁訪問日誌統計、倒排索引構建、文檔聚類、機器學習和統計機器翻譯。

本書介紹了高級MapReduce概念,從確定影響MapReduce工作表現的因素到調整MapReduce配置,一切都一覽無遺。基於實際經驗,本書將幫助您充分利用集群節點資源,以最佳方式運行MapReduce工作。

本書詳細介紹了Hadoop MapReduce工作表現優化過程。通過一系列清晰實用的步驟,它將幫助您充分利用集群節點資源。

從MapReduce的工作原理和影響工作表現的因素開始,您將獲得有關Hadoop指標和多個性能監控工具的概述。此外,您還將探索幫助您識別資源瓶頸、檢查集群健康狀況並調整Hadoop集群大小的性能計數器。您還將學習使用Combiners和壓縮來優化映射和減少任務。

本書以最佳實踐和建議結束,告訴您如何最佳地使用Hadoop集群。

從本書中您將學到:
- 了解影響MapReduce工作表現的因素
- 利用Hadoop MapReduce性能計數器識別資源瓶頸
- 調整Hadoop集群節點大小
- 正確設置映射器和減少器的數量
- 使用壓縮和Combiners優化映射器和減少器任務的吞吐量和代碼大小
- 理解各種調整屬性和最佳實踐以優化集群

這本書是一本以實例為基礎的教程,旨在優化MapReduce工作表現。

本書適合Hadoop管理員、開發人員、MapReduce使用者或初學者,如果您希望優化集群和應用程式,這本書是最佳選擇。不需要事先了解如何創建MapReduce應用程式,但這將有助於更好地理解MapReduce類模板代碼的概念和片段。