Apache Hudi: The Definitive Guide: Building Robust, Open, and High-Performing Data Lakehouses
暫譯: Apache Hudi:權威指南:構建穩健、開放且高效能的數據湖屋

Xu, Shiyan, Wason, Prashant, Saktheeswaran, Bhavani Sudha

  • 出版商: O'Reilly
  • 出版日期: 2025-12-02
  • 售價: $2,340
  • 貴賓價: 9.5$2,223
  • 語言: 英文
  • 頁數: 287
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 109817383X
  • ISBN-13: 9781098173838
  • 相關分類: 大數據 Big-data
  • 尚未上市,無法訂購

商品描述

Overcome challenges in building transactional guarantees on rapidly changing data by using Apache Hudi. With this practical guide, data engineers, data architects, and software architects will discover how to seamlessly build an interoperable lakehouse from disparate data sources and deliver faster insights using their query engine of choice.

Authors Shiyan Xu, Prashant Wason, Sudha Saktheeswaran, and Rebecca Bilbro provide practical examples and insights to help you unlock the full potential of data lakehouses for different levels of analytics, from batch to interactive to streaming. You'll also learn how to evaluate storage choices and leverage built-in automated table optimizations to build, maintain, and operate production data applications.

This book helps you:

  • Understand the need for transactional data lakehouses and the challenges associated with building them
  • Get up to speed with Apache Hudi and learn how it makes building data lakehouses easy
  • Explore data ecosystem support provided by Apache Hudi for popular data sources and query engines
  • Perform different write and read operations on Apache Hudi tables and effectively use them for various use cases, including batch and stream applications
  • Implement data engineering techniques to operate and manage Apache Hudi tables
  • Apply different storage techniques and considerations, such as indexing and clustering to maximize your lakehouse performance
  • Build end-to-end incremental data pipelines using Apache Hudi for faster ingestion and fresher analytics

商品描述(中文翻譯)

克服在快速變化的數據上建立交易保證的挑戰,使用 Apache Hudi。這本實用指南將幫助數據工程師、數據架構師和軟體架構師發現如何從不同的數據來源無縫構建可互操作的湖倉(lakehouse),並使用他們選擇的查詢引擎提供更快的洞察。

作者 Shiyan Xu、Prashant Wason、Sudha Saktheeswaran 和 Rebecca Bilbro 提供了實用的範例和見解,幫助您釋放數據湖倉在不同分析層級(從批次到互動再到串流)的全部潛力。您還將學習如何評估存儲選擇,並利用內建的自動化表優化來構建、維護和運營生產數據應用。

這本書幫助您:
- 理解交易數據湖倉的需求及其建設所面臨的挑戰
- 熟悉 Apache Hudi,了解它如何簡化數據湖倉的構建
- 探索 Apache Hudi 為流行數據來源和查詢引擎提供的數據生態系統支持
- 在 Apache Hudi 表上執行不同的寫入和讀取操作,並有效地用於各種用例,包括批次和串流應用
- 實施數據工程技術來操作和管理 Apache Hudi 表
- 應用不同的存儲技術和考量,例如索引和聚類,以最大化您的湖倉性能
- 使用 Apache Hudi 構建端到端的增量數據管道,以實現更快的數據攝取和更新的分析