Dataproc Cookbook: Running Spark and Hadoop Workloads in Google Cloud
暫譯: Dataproc 食譜:在 Google Cloud 上運行 Spark 和 Hadoop 工作負載
Sadineni, Narasimha, Venkataraman, Anuyogam
- 出版商: O'Reilly
- 出版日期: 2025-07-08
- 售價: $2,560
- 貴賓價: 9.5 折 $2,432
- 語言: 英文
- 頁數: 436
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1098157702
- ISBN-13: 9781098157708
-
相關分類:
Google Cloud、Hadoop、Spark
海外代購書籍(需單獨結帳)
相關主題
商品描述
Get up to speed with Dataproc, the fully managed and highly scalable service for running open source big data tools and frameworks, including Hadoop, Spark, Flink, and Presto. This cookbook shows data engineers, data scientists, data analysts, and cloud architects how to use Dataproc, integrated with Google Cloud, for data lake modernization, ETL, and secure data science at a fraction of the cost.
Narasimha Sadineni from Google and former Googler Anu Venkataraman show you how to set up and run Hadoop and Spark jobs on Dataproc. You'll learn how to create Dataproc clusters and run data engineering and data science workloads in long-running, ephemeral, and serverless ways. In the process, you'll gain an understanding of Dataproc, orchestration, logging and monitoring, Spark History Server, and migration patterns.
This cookbook includes hands-on examples for configuring, logging, securing clusters, and migrating from on-prem to Dataproc. You'll learn how to:
- Create Dataproc clusters on Compute Engine and Kubernetes Engine
- Run data science workloads on Dataproc
- Execute Spark jobs on Dataproc Serverless
- Optimize Dataproc clusters to be cost effective and performant
- Monitor Spark jobs in various ways
- Orchestrate various workloads and activities
- Use different methods for migrating data and workloads from existing Hadoop clusters to Dataproc
商品描述(中文翻譯)
快速掌握 Dataproc,這是一個完全管理且高度可擴展的服務,用於運行開源大數據工具和框架,包括 Hadoop、Spark、Flink 和 Presto。本食譜向數據工程師、數據科學家、數據分析師和雲架構師展示如何使用與 Google Cloud 整合的 Dataproc 進行數據湖現代化、ETL 和安全數據科學,並以極低的成本實現。
來自 Google 的 Narasimha Sadineni 和前 Google 員工 Anu Venkataraman 將向您展示如何在 Dataproc 上設置和運行 Hadoop 和 Spark 作業。您將學習如何創建 Dataproc 叢集,並以長期運行、短暫和無伺服器的方式運行數據工程和數據科學工作負載。在此過程中,您將了解 Dataproc、編排、日誌記錄和監控、Spark 歷史伺服器以及遷移模式。
本食譜包括配置、日誌記錄、安全叢集和從本地遷移到 Dataproc 的實作範例。您將學習如何:
- 在 Compute Engine 和 Kubernetes Engine 上創建 Dataproc 叢集
- 在 Dataproc 上運行數據科學工作負載
- 在 Dataproc Serverless 上執行 Spark 作業
- 優化 Dataproc 叢集以提高成本效益和性能
- 以多種方式監控 Spark 作業
- 編排各種工作負載和活動
- 使用不同的方法將數據和工作負載從現有的 Hadoop 叢集遷移到 Dataproc