Elasticsearch for Hadoop

Vishal Shukla

商品描述

Integrate Elasticsearch into Hadoop to effectively visualize and analyze your data

About This Book

  • Build production-ready analytics applications by integrating the Hadoop ecosystem with Elasticsearch
  • Learn complex Elasticsearch queries and develop real-time monitoring Kibana dashboards to visualize your data
  • Use Elasticsearch and Kibana to search data in Hadoop easily with this comprehensive, step-by-step guide

Who This Book Is For

This book is targeted at Java developers with basic knowledge on Hadoop. No prior Elasticsearch experience is expected.

What You Will Learn

  • Set up the Elasticsearch-Hadoop environment
  • Import HDFS data into Elasticsearch with MapReduce jobs
  • Perform full-text search and aggregations efficiently using Elasticsearch
  • Visualize data and create interactive dashboards using Kibana
  • Check and detect anomalies in streaming data using Storm and Elasticsearch
  • Inject and classify real-time streaming data into Elasticsearch
  • Get production-ready for Elasticsearch-Hadoop based projects
  • Integrate with Hadoop eco-system such as Pig, Storm, Hive, and Spark

In Detail

The Hadoop ecosystem is a de-facto standard for processing terra-bytes and peta-bytes of data. Lucene-enabled Elasticsearch is becoming an industry standard for its full-text search and aggregation capabilities. Elasticsearch-Hadoop serves as a perfect tool to bridge the worlds of Elasticsearch and Hadoop ecosystem to get best out of both the worlds. Powered with Kibana, this stack makes it a cakewalk to get surprising insights out of your massive amount of Hadoop ecosystem in a flash.

In this book, you'll learn to use Elasticsearch, Kibana and Elasticsearch-Hadoop effectively to analyze and understand your HDFS and streaming data.

You begin with an in-depth understanding of the Hadoop, Elasticsearch, Marvel, and Kibana setup. Right after this, you will learn to successfully import Hadoop data into Elasticsearch by writing MapReduce job in a real-world example. This is then followed by a comprehensive look at Elasticsearch essentials, such as full-text search analysis, queries, filters and aggregations; after which you gain an understanding of creating various visualizations and interactive dashboard using Kibana. Classifying your real-world streaming data and identifying trends in it using Storm and Elasticsearch are some of the other topics that we'll cover. You will also gain an insight about key concepts of Elasticsearch and Elasticsearch-hadoop in distributed mode, advanced configurations along with some common configuration presets you may need for your production deployments. You will have “Go production checklist” and high-level view for cluster administration for post-production. Towards the end, you will learn to integrate Elasticsearch with other Hadoop eco-system tools, such as Pig, Hive and Spark.

Style and approach

A concise yet comprehensive approach has been adopted with real-time examples to help you grasp the concepts easily.

商品描述(中文翻譯)

將 Elasticsearch 整合到 Hadoop 中,以有效地視覺化和分析您的數據

關於本書
- 通過將 Hadoop 生態系統與 Elasticsearch 整合,構建可投入生產的分析應用程式
- 學習複雜的 Elasticsearch 查詢,並開發實時監控的 Kibana 儀表板來視覺化您的數據
- 使用 Elasticsearch 和 Kibana 輕鬆在 Hadoop 中搜索數據,本書提供了全面的逐步指南

本書適合對 Hadoop 有基本知識的 Java 開發人員,不需要有 Elasticsearch 的經驗。

您將學到什麼
- 設置 Elasticsearch-Hadoop 環境
- 使用 MapReduce 作業將 HDFS 數據導入 Elasticsearch
- 使用 Elasticsearch 高效地進行全文搜索和聚合
- 使用 Kibana 可視化數據並創建交互式儀表板
- 使用 Storm 和 Elasticsearch 檢查和檢測流式數據中的異常
- 將實時流式數據注入和分類到 Elasticsearch
- 為基於 Elasticsearch-Hadoop 的項目準備投入生產
- 與 Pig、Storm、Hive 和 Spark 等 Hadoop 生態系統進行整合

詳細內容
Hadoop 生態系統已成為處理大量數據的事實標準。基於 Lucene 的 Elasticsearch 則因其全文搜索和聚合功能而成為行業標準。Elasticsearch-Hadoop 是將 Elasticsearch 和 Hadoop 生態系統相結合的完美工具,以充分發揮兩者的優勢。搭配 Kibana 使用,這個組合能夠輕鬆地從龐大的 Hadoop 生態系統中獲得驚人的洞察力。

在本書中,您將學習如何有效地使用 Elasticsearch、Kibana 和 Elasticsearch-Hadoop 分析和理解您的 HDFS 和流式數據。

您將首先深入了解 Hadoop、Elasticsearch、Marvel 和 Kibana 的設置。接著,您將通過一個真實世界的示例,成功地將 Hadoop 數據導入 Elasticsearch,並使用 MapReduce 作業進行實現。然後,您將全面了解 Elasticsearch 的基本知識,例如全文搜索分析、查詢、過濾和聚合;之後,您將學習如何使用 Kibana 創建各種可視化和交互式儀表板。我們還將涵蓋將實時流式數據進行分類並識別趨勢的主題,使用 Storm 和 Elasticsearch。您還將瞭解 Elasticsearch 和 Elasticsearch-Hadoop 在分佈式模式下的關鍵概念、高級配置以及一些常見的配置預設值,這些對於生產部署可能是必需的。在最後,您將學習如何將 Elasticsearch 與其他 Hadoop 生態系統工具(如 Pig、Hive 和 Spark)進行整合。

風格和方法
本書採用簡潔而全面的方法,並提供實時示例,以幫助您輕鬆理解概念。