Apache Flume: Distributed Log Collection for Hadoop, 2/e(Paperback)

Steve Hoffman

  • 出版商: Packt Publishing
  • 出版日期: 2015-02-28
  • 售價: $1,610
  • 貴賓價: 9.5$1,530
  • 語言: 英文
  • 頁數: 175
  • 裝訂: Paperback
  • ISBN: 1784392170
  • ISBN-13: 9781784392178
  • 相關分類: Hadoop
  • 下單後立即進貨 (約3~4週)

商品描述

Design and implement a series of Flume agents to send streamed data into Hadoop

About This Book

  • Construct a series of Flume agents using the Apache Flume service to efficiently collect, aggregate, and move large amounts of event data
  • Configure failover paths and load balancing to remove single points of failure
  • Use this step-by-step guide to stream logs from application servers to Hadoop's HDFS

Who This Book Is For

If you are a Hadoop programmer who wants to learn about Flume to be able to move datasets into Hadoop in a timely and replicable manner, then this book is ideal for you. No prior knowledge about Apache Flume is necessary, but a basic knowledge of Hadoop and the Hadoop File System (HDFS) is assumed.

What You Will Learn

  • Understand the Flume architecture, and also how to download and install open source Flume from Apache
  • Follow along a detailed example of transporting weblogs in Near Real Time (NRT) to Kibana/Elasticsearch and archival in HDFS
  • Learn tips and tricks for transporting logs and data in your production environment
  • Understand and configure the Hadoop File System (HDFS) Sink
  • Use a morphline-backed Sink to feed data into Solr
  • Create redundant data flows using sink groups
  • Configure and use various sources to ingest data
  • Inspect data records and move them between multiple destinations based on payload content
  • Transform data en-route to Hadoop and monitor your data flows

In Detail

Apache Flume is a distributed, reliable, and available service used to efficiently collect, aggregate, and move large amounts of log data. It is used to stream logs from application servers to HDFS for ad hoc analysis.

This book starts with an architectural overview of Flume and its logical components. It explores channels, sinks, and sink processors, followed by sources and channels. By the end of this book, you will be fully equipped to construct a series of Flume agents to dynamically transport your stream data and logs from your systems into Hadoop.

A step-by-step book that guides you through the architecture and components of Flume covering different approaches, which are then pulled together as a real-world, end-to-end use case, gradually going from the simplest to the most advanced features.

商品描述(中文翻譯)

設計並實現一系列的 Flume 代理程式,將串流數據傳送到 Hadoop

關於本書
- 使用 Apache Flume 服務構建一系列 Flume 代理程式,以高效地收集、聚合和移動大量事件數據
- 配置故障轉移路徑和負載平衡,消除單點故障
- 使用這本逐步指南,將應用伺服器的日誌串流到 Hadoop 的 HDFS

本書適合對 Flume 感興趣的 Hadoop 程式設計師,希望能夠及時且可複製地將數據集移入 Hadoop 的人。不需要事先了解 Apache Flume,但需要基本的 Hadoop 和 Hadoop 文件系統(HDFS)知識。

你將學到什麼
- 了解 Flume 架構,以及如何從 Apache 下載和安裝開源 Flume
- 跟隨詳細的示例,將網絡日誌(NRT)傳輸到 Kibana/Elasticsearch 並存檔到 HDFS
- 學習在生產環境中傳輸日誌和數據的技巧和訣竅
- 了解並配置 Hadoop 文件系統(HDFS)Sink
- 使用 morphline-backed Sink 將數據提供給 Solr
- 創建使用 Sink 群組的冗余數據流
- 配置和使用各種來源來載入數據
- 檢查數據記錄並根據內容將其移動到多個目的地
- 在數據傳輸到 Hadoop 期間進行數據轉換和監控數據流

詳細內容
Apache Flume 是一個分佈式、可靠且可用的服務,用於高效地收集、聚合和移動大量日誌數據。它用於將應用伺服器的日誌串流到 HDFS 進行即席分析。

本書從 Flume 的架構概述和邏輯組件開始。它探討了通道、Sink 和 Sink 處理器,然後是來源和通道。通過閱讀本書,您將完全掌握構建一系列 Flume 代理程式的能力,以動態地將流數據和日誌從系統傳輸到 Hadoop。

這是一本逐步指南,引導您了解 Flume 的架構和組件,涵蓋不同的方法,然後將其作為一個真實的端到端用例結合起來,從最簡單到最高級的功能逐步進行。