Fast Data Processing with Spark, 2/e(Paperback)

Krishna Sankar, Holden Karau

  • 出版商: Packt Publishing
  • 出版日期: 2015-03-31
  • 售價: $1,250
  • 貴賓價: 9.5$1,188
  • 語言: 英文
  • 頁數: 184
  • 裝訂: Paperback
  • ISBN: 178439257X
  • ISBN-13: 9781784392574
  • 相關分類: Spark
  • 下單後立即進貨 (約3~4週)

買這商品的人也買了...

商品描述

Perform real-time analytics using Spark in a fast, distributed, and scalable way

About This Book

  • Develop a machine learning system with Spark's MLlib and scalable algorithms
  • Deploy Spark jobs to various clusters such as Mesos, EC2, Chef, YARN, EMR, and so on
  • This is a step-by-step tutorial that unleashes the power of Spark and its latest features

Who This Book Is For

Fast Data Processing with Spark - Second Edition is for software developers who want to learn how to write distributed programs with Spark. It will help developers who have had problems that were too big to be dealt with on a single computer. No previous experience with distributed programming is necessary. This book assumes knowledge of either Java, Scala, or Python.

What You Will Learn

  • Install and set up Spark on your cluster
  • Prototype distributed applications with Spark's interactive shell
  • Learn different ways to interact with Spark's distributed representation of data (RDDs)
  • Query Spark with a SQL-like query syntax
  • Effectively test your distributed software
  • Recognize how Spark works with big data
  • Implement machine learning systems with highly scalable algorithms

In Detail

Spark is a framework used for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does, but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and built-in tools for interactive query analysis (Spark SQL), large-scale graph processing and analysis (GraphX), and real-time analysis (Spark Streaming), it can be interactively used to quickly process and query big datasets.

Fast Data Processing with Spark - Second Edition covers how to write distributed programs with Spark. The book will guide you through every step required to write effective distributed programs from setting up your cluster and interactively exploring the API to developing analytics applications and tuning them for your purposes.

商品描述(中文翻譯)

使用Spark以快速、分散和可擴展的方式進行實時分析

關於本書
- 使用Spark的MLlib和可擴展算法開發機器學習系統
- 將Spark作業部署到各種集群,如Mesos、EC2、Chef、YARN、EMR等
- 這是一本逐步指南,揭示了Spark及其最新功能的威力

本書適合對分散式編程有興趣的軟體開發人員。它將幫助那些在單台電腦上處理問題太大而無法應付的開發人員。不需要有分散式編程的先前經驗。本書假設讀者具備Java、Scala或Python的知識。

你將學到什麼
- 在集群上安裝和設置Spark
- 使用Spark的互動式shell原型分散式應用程式
- 學習與Spark的分散式數據表示(RDD)進行交互的不同方法
- 使用類似SQL的查詢語法查詢Spark
- 有效地測試分散式軟體
- 了解Spark如何處理大數據
- 使用高度可擴展的算法實現機器學習系統

詳細內容
Spark是一個用於撰寫快速、分散式程式的框架。Spark解決了與Hadoop MapReduce相似的問題,但具有快速的內存處理方式和乾淨的函數式API。它能夠與Hadoop整合,並具有用於互動式查詢分析(Spark SQL)、大規模圖形處理和分析(GraphX)以及實時分析(Spark Streaming)的內建工具,可以互動地快速處理和查詢大型數據集。

《使用Spark進行快速數據處理-第二版》介紹了如何使用Spark撰寫分散式程式。本書將引導您完成從設置集群和互動式探索API到開發分析應用程式並根據您的需求進行調優的每一個步驟。