Spark in Action

Petar Zecevic, Marko Bonaci

買這商品的人也買了...

相關主題

商品描述

Summary

Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. Fully updated for Spark 2.0.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the Technology

Big data systems distribute datasets across clusters of machines, making it a challenge to efficiently query, stream, and interpret them. Spark can help. It is a processing system designed specifically for distributed data. It provides easy-to-use interfaces, along with the performance you need for production-quality analytics and machine learning. Spark 2 also adds improved programming APIs, better performance, and countless other upgrades.

About the Book

Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. You'll get comfortable with the Spark CLI as you work through a few introductory examples. Then, you'll start programming Spark using its core APIs. Along the way, you'll work with structured data using Spark SQL, process near-real-time streaming data, apply machine learning algorithms, and munge graph data using Spark GraphX. For a zero-effort startup, you can download the preconfigured virtual machine ready for you to try the book's code.

What's Inside

 

  • Updated for Spark 2.0
  • Real-life case studies
  • Spark DevOps with Docker
  • Examples in Scala, and online in Java and Python

About the Reader

Written for experienced programmers with some background in big data or machine learning.

About the Authors

Petar Zečević and Marko Bonaći are seasoned developers heavily involved in the Spark community.

Table of Contents

 

PART 1 - FIRST STEPS

PART 2 - MEET THE SPARK FAMILY

PART 3 - SPARK OPS

PART 4 - BRINGING IT TOGETHER

  1. Introduction to Apache Spark
  2. Spark fundamentals
  3. Writing Spark applications
  4. The Spark API in depth
  5. Sparkling queries with Spark SQL
  6. Ingesting data with Spark Streaming
  7. Getting smart with MLlib
  8. ML: classification and clustering
  9. Connecting the dots with GraphX
  10. Running Spark
  11. Running on a Spark standalone cluster
  12. Running on YARN and Mesos
  13. Case study: real-time dashboard
  14. Deep learning on Spark with H2O

商品描述(中文翻譯)


摘要

Spark in Action教授您使用Spark有效處理批次和流式數據所需的理論和技能。全面更新至Spark 2.0。

購買印刷版書籍將包含Manning Publications提供的PDF、Kindle和ePub格式的免費電子書。

關於技術

大數據系統將數據集分佈在機器集群中,這使得高效查詢、流式傳輸和解釋數據成為一個挑戰。Spark可以幫助您。它是一個專為分佈式數據設計的處理系統。它提供易於使用的界面,以及您在生產質量分析和機器學習中所需的性能。Spark 2還增加了改進的編程API、更好的性能和無數其他升級。

關於本書

Spark in Action教授您使用Spark有效處理批次和流式數據所需的理論和技能。通過一些入門示例,您將熟悉Spark CLI。然後,您將使用其核心API編程Spark。在此過程中,您將使用Spark SQL處理結構化數據,處理近實時流數據,應用機器學習算法,並使用Spark GraphX處理圖數據。為了讓您輕鬆開始,您可以下載預配置的虛擬機器,準備好試用本書的代碼。

內容簡介

 


  • 更新至Spark 2.0

  • 真實案例研究

  • 使用Docker進行Spark DevOps

  • Scala示例,以及Java和Python的線上示例

讀者對象

本書適合有一定大數據或機器學習背景的經驗豐富的程序員。

作者簡介

Petar ZečevićMarko Bonaći是深度參與Spark社區的經驗豐富的開發人員。

目錄

 

第1部分 - 初步

第2部分 - 認識Spark家族

第3部分 - Spark操作

第4部分 - 綜合應用


  1. Apache Spark簡介

  2. Spark基礎知識

  3. 編寫Spark應用程序

  4. 深入了解Spark API

  5. 使用Spark SQL進行查詢

  6. 使用Spark Streaming接收數據

  7. 使用MLlib進行智能處理

  8. 機器學習:分類和聚類

  9. 使用GraphX連接數據

  10. 運行Spark

  11. 在Spark獨立集群上運行

  12. 在YARN和Mesos上運行

  13. 案例研究:實時儀表板

  14. 使用H2O在Spark上進行深度學習