Hadoop 2.x Administration Cookbook
暫譯: Hadoop 2.x 管理實用手冊
Gurmukh Singh
- 出版商: Packt Publishing
- 出版日期: 2017-05-29
- 售價: $1,940
- 貴賓價: 9.5 折 $1,843
- 語言: 英文
- 頁數: 348
- 裝訂: Paperback
- ISBN: 1787126730
- ISBN-13: 9781787126732
-
相關分類:
Hadoop
海外代購書籍(需單獨結帳)
買這商品的人也買了...
-
$480$379 -
$880$695 -
$990Java: The Complete Reference, 9/e (Paperback)
-
$550$435 -
$420$332 -
$950$950 -
$800Java Deep Learning Essentials (Paperback)
-
$650$553 -
$580$458 -
$650$553 -
$500$425 -
$1,940$1,843 -
$1,320Mastering Java for Data Science
-
$450$356 -
$390$308 -
$1,940$1,843 -
$2,120$2,014 -
$1,320Mastering Apache Spark 2.x - Second Edition
-
$1,940$1,843 -
$580$458 -
$490$245 -
$480$408 -
$474深度學習與計算機視覺 : 算法原理、框架應用與代碼實現 (Deep Learning & Computer Vision:Algorithms and Examples)
-
$430$387 -
$780$616
商品描述
Key Features
- Become an expert Hadoop administrator and perform tasks to optimize your Hadoop Cluster
- Import and export data into Hive and use Oozie to manage workflow.
- Practical recipes will help you plan and secure your Hadoop cluster, and make it highly available
Book Description
Hadoop enables the distributed storage and processing of large datasets across clusters of computers. Learning how to administer Hadoop is crucial to exploit its unique features. With this book, you will be able to overcome common problems encountered in Hadoop administration.
The book begins with laying the foundation by showing you the steps needed to set up a Hadoop cluster and its various nodes. You will get a better understanding of how to maintain Hadoop cluster, especially on the HDFS layer and using YARN and MapReduce. Further on, you will explore durability and high availability of a Hadoop cluster.
You'll get a better understanding of the schedulers in Hadoop and how to configure and use them for your tasks. You will also get hands-on experience with the backup and recovery options and the performance tuning aspects of Hadoop. Finally, you will get a better understanding of troubleshooting, diagnostics, and best practices in Hadoop administration.
By the end of this book, you will have a proper understanding of working with Hadoop clusters and will also be able to secure, encrypt it, and configure auditing for your Hadoop clusters.
What you will learn
- Set up the Hadoop architecture to run a Hadoop cluster smoothly
- Maintain a Hadoop cluster on HDFS, YARN, and MapReduce
- Understand high availability with Zookeeper and Journal Node
- Configure Flume for data ingestion and Oozie to run various workflows
- Tune the Hadoop cluster for optimal performance
- Schedule jobs on a Hadoop cluster using the Fair and Capacity scheduler
- Secure your cluster and troubleshoot it for various common pain points
About the Author
Gurmukh Singh is a seasoned technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies.
He has worked with companies such as HP, JP Morgan, and Yahoo.
He has authored Monitoring Hadoop by Packt Publishing
Table of Contents
- Hadoop Architecture and Deployment
- Maintain Hadoop Cluster - HDFS
- Maintain Hadoop Cluster -YARN and MapReduce
- High Availability
- Schedulers
- Backup and Recovery
- Data Ingestion and Workflow
- Performance Tuning
- Hbase and RDBMS
- Cluster Planning
- Troubleshooting, Diagnostics and Best practises
- Security
商品描述(中文翻譯)
主要特點
- 成為專業的 Hadoop 管理員,執行優化 Hadoop 集群的任務
- 將數據導入和導出到 Hive,並使用 Oozie 管理工作流程
- 實用的食譜將幫助您規劃和保護您的 Hadoop 集群,並使其高度可用
書籍描述
Hadoop 使得在計算機集群中分散存儲和處理大型數據集成為可能。學習如何管理 Hadoop 對於發揮其獨特功能至關重要。通過本書,您將能夠克服在 Hadoop 管理中遇到的常見問題。
本書首先通過展示設置 Hadoop 集群及其各個節點所需的步驟來奠定基礎。您將更好地理解如何維護 Hadoop 集群,特別是在 HDFS 層以及使用 YARN 和 MapReduce。接下來,您將探索 Hadoop 集群的耐久性和高可用性。
您將更好地理解 Hadoop 中的調度器,以及如何配置和使用它們來完成您的任務。您還將獲得有關備份和恢復選項以及 Hadoop 性能調優方面的實踐經驗。最後,您將更好地理解故障排除、診斷和 Hadoop 管理中的最佳實踐。
在本書結束時,您將對使用 Hadoop 集群有正確的理解,並能夠保護、加密並為您的 Hadoop 集群配置審計。
您將學到什麼
- 設置 Hadoop 架構以順利運行 Hadoop 集群
- 在 HDFS、YARN 和 MapReduce 上維護 Hadoop 集群
- 理解 Zookeeper 和 Journal Node 的高可用性
- 配置 Flume 以進行數據攝取,並使用 Oozie 運行各種工作流程
- 調整 Hadoop 集群以達到最佳性能
- 使用 Fair 和 Capacity 調度器在 Hadoop 集群上安排作業
- 保護您的集群並針對各種常見問題進行故障排除
關於作者
Gurmukh Singh 是一位經驗豐富的技術專業人士,擁有超過 14 年的基礎設計、分散式系統、性能優化和網絡的行業經驗。他在過去 5 年中從事大數據領域的工作,並提供各種技術的諮詢和培訓。
他曾與 HP、JP Morgan 和 Yahoo 等公司合作。
他是 Packt Publishing 出版的《Monitoring Hadoop》的作者。
目錄
- Hadoop 架構與部署
- 維護 Hadoop 集群 - HDFS
- 維護 Hadoop 集群 - YARN 和 MapReduce
- 高可用性
- 調度器
- 備份與恢復
- 數據攝取與工作流程
- 性能調優
- Hbase 和 RDBMS
- 集群規劃
- 故障排除、診斷與最佳實踐
- 安全性