Mastering Hadoop 3: Big Data processing at scale to unlock unique business insights
Chanchal Singh;Manish Kumar;Dr. Timothy Wong
Your guide to master the most advanced concepts of Hadoop 3
- Master the newly introduced features and capabilities of Hadoop 3 - the world's most popular Big Data ecosystem
- Crunch and process your data with ease using MapReduce, YARN and a whole host of other tools within the Hadoop ecosystem
- A highly practical book with real-world case studies and easy to understand code to help you master Hadoop
Apache Hadoop is one of the most popular Big Data solutions for distributed storage and processing of large chunks of data. With Hadoop 3, Apache promises to bringing a high-performance, more fault-tolerant and more efficient Big Data processing platform, with focus on better scalability and efficiency.
This is a comprehensive guide to understand advanced concepts of Hadoop ecosystem tool. You will learn how Hadoop works internally, advance concepts of different ecosystem tools, solution to some of real world use case and how to secure your cluster. It will then walk you through some of advance concepts of HDFS, YARN, MapReduce and Hadoop3. We will address some of the common challenges like, how to use Kafka efficiently, design low latency reliable message delivery Kafka systems, handle high data volumes, how to address some of the top-level concerns of building an enterprise grade messaging system and how to use different stream processing systems along with Kafka to fulfill their enterprise goals.
By the end of this book you will have an understanding of how components in the Hadoop ecosystem are effectively integrated to implement, a Fast & Reliable data pipeline. Also how to tackle different real-world problem when they occur in data pipeline.
What you will learn
- Get an in-depth understanding of distributed computing using Hadoop 3
- Develop enterprise-grade applications using Apache Spark, Flink, and more.
- Build scalable and high performant Hadoop Data pipelines with security, monitoring and data governance at place
- Build distributed, scalable, reliable and high performant Hadoop Data pipelines with security, monitoring and data governance at place.
- Best Practices for Enterprises using or planning to use Hadoop 3 as data platform
Who This Book Is For
If you want to become a Big Data professional by mastering the advanced concepts in Hadoop, this book is for you. If you're a Hadoop professional looking to strengthen your knowledge of the Hadoop ecosystem, this book will also help you. A fundamental knowledge of the Java programming language and some basics of Hadoop is required to get started with this book.