Spark: The Definitive Guide: Big Data Processing Made Simple (Paperback)

Bill Chambers, Matei Zaharia

買這商品的人也買了...

商品描述

Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of this open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals.

You’ll explore the basic operations and common functions of Spark’s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Spark’s scalable machine learning library.

  • Get a gentle overview of big data and Spark
  • Learn about DataFrames, SQL, and Datasets—Spark’s core APIs—through worked examples
  • Dive into Spark’s low-level APIs, RDDs, and execution of SQL and DataFrames
  • Understand how Spark runs on a cluster
  • Debug, monitor, and tune Spark clusters and applications
  • Learn the power of Spark’s Structured Streaming and MLlib for machine learning tasks
  • Explore the wider Spark ecosystem, including SparkR and Graph Analysis
  • Examine Spark deployment, including coverage of Spark in the Cloud

商品描述(中文翻譯)

學習如何使用、部署和維護 Apache Spark,這本全面指南由這個開源集群計算框架的創造者撰寫。作者 Bill Chambers 和 Matei Zaharia 強調了 Spark 2.0 中的改進和新功能,將 Spark 主題分為不同的部分,每個部分都有獨特的目標。

您將探索 Spark 的結構化 API 的基本操作和常見功能,以及結構化流式處理 (Structured Streaming),這是一個用於構建端到端流式應用程序的新高級 API。開發人員和系統管理員將學習監控、調優和調試 Spark 的基礎知識,並探索使用 MLlib 的機器學習技術和場景,MLlib 是 Spark 的可擴展機器學習庫。

- 獲得大數據和 Spark 的簡要概述
- 通過實例學習 DataFrames、SQL 和 Datasets - Spark 的核心 API
- 深入研究 Spark 的低級 API、RDD 和 SQL、DataFrames 的執行
- 了解 Spark 如何在集群上運行
- 調試、監控和調優 Spark 集群和應用程序
- 學習 Spark 的結構化流式處理和 MLlib 在機器學習任務中的威力
- 探索更廣泛的 Spark 生態系統,包括 SparkR 和圖分析
- 檢查 Spark 的部署,包括 Spark 在雲端的應用