Apache Flume: Distributed Log Collection for Hadoop (What You Need to Know)
暫譯: Apache Flume:Hadoop 的分散式日誌收集(您需要知道的事項)
Steve Hoffman
- 出版商: Packt Publishing
- 出版日期: 2013-07-04
- 售價: $1,530
- 貴賓價: 9.5 折 $1,454
- 語言: 英文
- 頁數: 108
- 裝訂: Paperback
- ISBN: 1782167919
- ISBN-13: 9781782167914
-
相關分類:
Hadoop
海外代購書籍(需單獨結帳)
買這商品的人也買了...
-
Java Objects 徹底研究 (Beginning Java Objects: From Concepts to Code, 2/e)$720$562 -
上班族一定要會的 Excel 技巧-不必問前輩‧效率馬上 UP !$320$272 -
Linux 驅動程式開發實戰 (Essential Linux Device Drivers)$750$593 -
Understanding Cryptography: A Textbook for Students and Practitioners (Hardcover)$2,120$2,014 -
精通 Python 3 程式設計, 2/e (Programming in Python 3: A Complete Introduction to the Python Language, 2/e)$680$537 -
Google Android SDK 開發範例大全, 3/e$950$751 -
Android 4.X 手機/平板電腦程式設計入門、應用到精通, 2/e (適用 Android 1.X~4.X)$520$411 -
HTML & CSS : 網站設計建置優化之道 (HTML and CSS: Design and Build Websites)$580$493 -
微積分, 7/e (Stewart)$780$764 -
ASP.NET MVC 4 網站開發美學$680$537 -
Visual C# 2012 資料庫程式設計暨進銷存系統實作$650$514 -
超圖解 Arduino 互動設計入門 (附 Arduino UNO R3 開發板)$1,130$961 -
易讀程式之美學-提升程式碼可讀性的簡單法則 (The Art of Readable Code)$480$379 -
雲端行動 App 設計與開發-使用 CmoreCloud 雲端行動 App 設計與開發,讓您不會寫程式也能輕鬆、快速的設計 App!$290$226 -
深入淺出 HTML and CSS, 2/e (Head First HTML and CSS, 2/e)$880$695 -
王者歸來-PHP 完全開發範例集, 2/e$860$731 -
無瑕的程式碼-敏捷軟體開發技巧守則 + 番外篇-專業程式設計師的生存之道 (雙書合購)$940$700 -
Raspberry Pi 從入門到應用 + Raspberry Pi rev 2 Model B 512MB (超值限量合購組)$2,340$1,825 -
電腦網際網路, 6/e (國際版)(Computer Networking: A Top-Down Approach, 6/e)(附部分內容光碟)$650$585 -
一觸即發|Windows 8.1 玩全手冊$299$236 -
透視 C語言指標-深度探索記憶體管理核心技術 (Understanding and Using C Pointers)$480$379 -
設計模式的解析與活用 (Design Patterns Explained: A New Perspective on Object-Oriented Design, 2/e)$480$374 -
An Introduction to Mathematical Cryptography (Hardcover)$2,500$2,375 -
培養與鍛鍊程式設計的邏輯腦:世界級程式設計大賽的知識、心得與解題分享, 2/e (CPE 大學程式能力檢定最佳參考用書)$520$406 -
一次擁有 Linux 雙認證-LPIC Level I + Novell CLA 11 自學手冊, 2/e$750$593
商品描述
If your role includes moving datasets into Hadoop, this book will help you do it more efficiently using Apache Flume. From installation to customization, it's a complete step-by-step guide on making the service work for you.
Overview
- Integrate Flume with your data sources
- Transcode your data en-route in Flume
- Route and separate your data using regular expression matching
- Configure failover paths and load-balancing to remove single points of failure
- Utilize Gzip Compression for files written to HDFS
In Detail
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop's HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with many failover and recovery mechanisms.
Apache Flume: Distributed Log Collection for Hadoop covers problems with HDFS and streaming data/logs, and how Flume can resolve these problems. This book explains the generalized architecture of Flume, which includes moving data to/from databases, NO-SQL-ish data stores, as well as optimizing performance. This book includes real-world scenarios on Flume implementation.
Apache Flume: Distributed Log Collection for Hadoop starts with an architectural overview of Flume and then discusses each component in detail. It guides you through the complete installation process and compilation of Flume.
It will give you a heads-up on how to use channels and channel selectors. For each architectural component (Sources, Channels, Sinks, Channel Processors, Sink Groups, and so on) the various implementations will be covered in detail along with configuration options. You can use it to customize Flume to your specific needs. There are pointers given on writing custom implementations as well that would help you learn and implement them.
What you will learn from this book
- Understand the Flume architecture
- Download and install open source Flume from Apache
- Discover when to use a memory or file-backed channel
- Understand and configure the Hadoop File System (HDFS) sink
- Learn how to use sink groups to create redundant data flows
- Configure and use various sources for ingesting data
- Inspect data records and route to different or multiple destinations based on payload content
- Transform data en-route to Hadoop
- Monitor your data flows
Approach
A starter guide that covers Apache Flume in detail.
Who this book is written for
Apache Flume: Distributed Log Collection for Hadoop is intended for people who are responsible for moving datasets into Hadoop in a timely and reliable manner like software engineers, database administrators, and data warehouse administrators.
商品描述(中文翻譯)
如果您的角色包括將數據集移動到 Hadoop,本書將幫助您更有效地使用 Apache Flume。從安裝到自定義,這是一本完整的逐步指南,幫助您使該服務為您工作。
概述
- 將 Flume 與您的數據源整合
- 在 Flume 中轉碼您的數據
- 使用正則表達式匹配來路由和分隔您的數據
- 配置故障轉移路徑和負載平衡,以消除單點故障
- 利用 Gzip 壓縮寫入 HDFS 的文件
詳細內容
Apache Flume 是一個分散式、可靠且可用的服務,用於有效地收集、聚合和移動大量日誌數據。其主要目標是將數據從應用程序傳送到 Apache Hadoop 的 HDFS。它擁有基於流數據流的簡單且靈活的架構,並且具有強大的容錯能力,擁有多種故障轉移和恢復機制。
《Apache Flume: Distributed Log Collection for Hadoop》涵蓋了 HDFS 和流數據/日誌的問題,以及 Flume 如何解決這些問題。本書解釋了 Flume 的通用架構,包括將數據移動到/從數據庫、類 NO-SQL 的數據存儲,以及優化性能。本書還包括 Flume 實施的實際場景。
《Apache Flume: Distributed Log Collection for Hadoop》首先介紹 Flume 的架構概述,然後詳細討論每個組件。它將指導您完成 Flume 的完整安裝過程和編譯。
本書將讓您了解如何使用通道和通道選擇器。對於每個架構組件(來源、通道、匯、通道處理器、匯組等),將詳細介紹各種實現及其配置選項。您可以使用它來根據您的特定需求自定義 Flume。本書還提供了編寫自定義實現的指導,幫助您學習和實施。
到最後,您應該能夠構建一系列 Flume 代理,將您的流數據和日誌從系統實時傳輸到 Hadoop。
您將從本書中學到的內容
- 了解 Flume 架構
- 從 Apache 下載並安裝開源 Flume
- 知道何時使用內存或文件支持的通道
- 了解並配置 Hadoop 文件系統(HDFS)匯
- 學習如何使用匯組創建冗餘數據流
- 配置和使用各種來源以攝取數據
- 檢查數據記錄並根據有效負載內容路由到不同或多個目的地
- 在傳輸過程中轉換數據到 Hadoop
- 監控您的數據流
方法
一本詳細介紹 Apache Flume 的入門指南。
本書的讀者對象
《Apache Flume: Distributed Log Collection for Hadoop》適合那些負責及時可靠地將數據集移動到 Hadoop 的人,如軟體工程師、數據庫管理員和數據倉庫管理員。
