Mastering Hadoop 3: Big Data processing at scale to unlock unique business insights

Chanchal Singh;Manish Kumar;Dr. Timothy Wong

  • 出版商: Packt Publishing
  • 出版日期: 2019-02-28
  • 售價: $1,800
  • 貴賓價: 9.5$1,710
  • 語言: 英文
  • 頁數: 797
  • 裝訂: Paperback
  • ISBN: 1788620445
  • ISBN-13: 9781788620444
  • 相關分類: Hadoop大數據 Big-data
  • 立即出貨 (庫存=1)

買這商品的人也買了...

商品描述

Your guide to master the most advanced concepts of Hadoop 3

Key Features

  • Master the newly introduced features and capabilities of Hadoop 3 - the world's most popular Big Data ecosystem
  • Crunch and process your data with ease using MapReduce, YARN and a whole host of other tools within the Hadoop ecosystem
  • A highly practical book with real-world case studies and easy to understand code to help you master Hadoop

Book Description

Apache Hadoop is one of the most popular Big Data solutions for distributed storage and processing of large chunks of data. With Hadoop 3, Apache promises to bringing a high-performance, more fault-tolerant and more efficient Big Data processing platform, with focus on better scalability and efficiency.

This is a comprehensive guide to understand advanced concepts of Hadoop ecosystem tool. You will learn how Hadoop works internally, advance concepts of different ecosystem tools, solution to some of real world use case and how to secure your cluster. It will then walk you through some of advance concepts of HDFS, YARN, MapReduce and Hadoop3. We will address some of the common challenges like, how to use Kafka efficiently, design low latency reliable message delivery Kafka systems, handle high data volumes, how to address some of the top-level concerns of building an enterprise grade messaging system and how to use different stream processing systems along with Kafka to fulfill their enterprise goals.

By the end of this book you will have an understanding of how components in the Hadoop ecosystem are effectively integrated to implement, a Fast & Reliable data pipeline. Also how to tackle different real-world problem when they occur in data pipeline.

What you will learn

  • Get an in-depth understanding of distributed computing using Hadoop 3 
  • Develop enterprise-grade applications using Apache Spark, Flink, and more. 
  • Build scalable and high performant Hadoop Data pipelines with security, monitoring and data governance at place
  • Build distributed, scalable, reliable and high performant Hadoop Data pipelines with security, monitoring and data governance at place.
  • Best Practices for Enterprises using or planning to use Hadoop 3 as data platform

Who This Book Is For

If you want to become a Big Data professional by mastering the advanced concepts in Hadoop, this book is for you. If you're a Hadoop professional looking to strengthen your knowledge of the Hadoop ecosystem, this book will also help you. A fundamental knowledge of the Java programming language and some basics of Hadoop is required to get started with this book.

商品描述(中文翻譯)

《精通Hadoop 3最先進概念的指南》

主要特點:

- 掌握Hadoop 3的新功能和能力,這是全球最受歡迎的大數據生態系統
- 使用MapReduce、YARN和Hadoop生態系統中的其他工具輕鬆處理和處理數據
- 這是一本高度實用的書,提供真實案例和易於理解的代碼,幫助您精通Hadoop

書籍描述:

Apache Hadoop是分布式存儲和處理大量數據的最受歡迎的大數據解決方案之一。Hadoop 3承諾提供一個高性能、更容錯和更高效的大數據處理平台,並專注於更好的可擴展性和效率。

這是一本全面的指南,用於理解Hadoop生態系統工具的高級概念。您將學習Hadoop的內部工作原理、不同生態系統工具的高級概念、解決一些實際用例的解決方案以及如何保護您的集群。然後,它將引導您了解HDFS、YARN、MapReduce和Hadoop 3的一些高級概念。我們將解決一些常見的挑戰,例如如何有效使用Kafka、設計低延遲可靠的消息傳遞Kafka系統、處理大數據量,以及如何解決構建企業級消息系統的一些頂層問題,以及如何使用不同的流處理系統與Kafka一起實現企業目標。

通過閱讀本書,您將了解Hadoop生態系統中的組件如何有效集成以實現快速可靠的數據管道。同時,您還將學習如何應對數據管道中出現的不同實際問題。

您將學到什麼:

- 深入了解使用Hadoop 3進行分布式計算
- 使用Apache Spark、Flink等開發企業級應用程序
- 構建具有安全性、監控和數據治理的可擴展和高性能的Hadoop數據管道
- 構建具有安全性、監控和數據治理的分布式、可擴展、可靠和高性能的Hadoop數據管道
- 使用或計劃使用Hadoop 3作為數據平台的企業的最佳實踐

本書適合對Hadoop的高級概念感興趣的大數據專業人士。如果您是一名Hadoop專業人員,希望加強對Hadoop生態系統的了解,本書也將對您有所幫助。開始閱讀本書需要對Java編程語言有基本的了解和對Hadoop的基礎知識。