Apache Spark 2 for Beginners (Paperback )

Rajanarayanan Thottuvaikkatumana

  • 出版商: Packt Publishing
  • 出版日期: 2016-09-30
  • 售價: $1,580
  • 貴賓價: 9.5$1,501
  • 語言: 英文
  • 頁數: 332
  • 裝訂: Paperback
  • ISBN: 1785885006
  • ISBN-13: 9781785885006
  • 相關分類: Spark
  • 下單後立即進貨 (約3~4週)

商品描述

Key Features

  • This book offers an easy introduction to the Spark framework published on the latest version of Apache Spark 2
  • Perform efficient data processing, machine learning and graph processing using various Spark components
  • A practical guide aimed at beginners to get them up and running with Spark

Book Description

Spark is one of the most widely-used large-scale data processing engines and runs extremely fast. It is a framework that has tools that are equally useful for application developers as well as data scientists.

This book starts with the fundamentals of Spark 2 and covers the core data processing framework and API, installation, and application development setup. Then the Spark programming model is introduced through real-world examples followed by Spark SQL programming with DataFrames. An introduction to SparkR is covered next. Later, we cover the charting and plotting features of Python in conjunction with Spark data processing. After that, we take a look at Spark's stream processing, machine learning, and graph processing libraries. The last chapter combines all the skills you learned from the preceding chapters to develop a real-world Spark application.

By the end of this book, you will have all the knowledge you need to develop efficient large-scale applications using Apache Spark.

What you will learn

  • Get to know the fundamentals of Spark 2 and the Spark programming model using Scala and Python
  • Know how to use Spark SQL and DataFrames using Scala and Python
  • Get an introduction to Spark programming using R
  • Perform Spark data processing, charting, and plotting using Python
  • Get acquainted with Spark stream processing using Scala and Python
  • Be introduced to machine learning using Spark MLlib
  • Get started with graph processing using the Spark GraphX
  • Bring together all that you've learned and develop a complete Spark application

About the Author

Rajanarayanan Thottuvaikkatumana, Raj, is a seasoned technologist with more than 23 years of software development experience at various multinational companies. He has lived and worked in India, Singapore, and the USA, and is presently based out of the UK. His experience includes architecting, designing, and developing software applications. He has worked on various technologies including major databases, application development platforms, web technologies, and big data technologies. Since 2000, he has been working mainly in Java related technologies, and does heavy-duty server-side programming in Java and Scala. He has worked on very highly concurrent, highly distributed, and high transaction volume systems. Currently he is building a next generation Hadoop YARN-based data processing platform and an application suite built with Spark using Scala.

Raj holds one master's degree in Mathematics, one master's degree in Computer Information Systems and has many certifications in ITIL and cloud computing to his credit. Raj is the author of Cassandra Design Patterns - Second Edition, published by Packt.

When not working on the assignments his day job demands, Raj is an avid listener to classical music and watches a lot of tennis.

Table of Contents

  1. Spark Fundamentals
  2. Spark Programming Model
  3. Spark SQL
  4. Spark Programming with R
  5. Spark Data Analysis with Python
  6. Spark Stream Processing
  7. Spark Machine Learning
  8. Spark Graph Processing
  9. Designing Spark Applications

商品描述(中文翻譯)

主要特點



  • 本書提供了對於最新版本Apache Spark 2的Spark框架的簡單介紹

  • 使用各種Spark組件進行高效的數據處理、機器學習和圖形處理

  • 一本針對初學者的實用指南,幫助他們快速上手Spark

書籍描述


Spark是最廣泛使用的大規模數據處理引擎之一,運行速度極快。它是一個框架,對於應用程序開發人員和數據科學家同樣有用。


本書從Spark 2的基礎知識開始,涵蓋核心數據處理框架和API、安裝和應用程序開發設置。然後通過實際示例介紹Spark編程模型,接著介紹使用DataFrames的Spark SQL編程。下一部分介紹了SparkR。然後,我們結合Python的圖表和繪圖功能與Spark數據處理一起使用。之後,我們介紹Spark的流處理、機器學習和圖形處理庫。最後一章將前面章節學到的技能結合起來,開發一個真實的Spark應用程序。


通過閱讀本書,您將獲得使用Apache Spark開發高效大規模應用程序所需的所有知識。

你將學到什麼



  • 了解Spark 2的基礎知識和使用Scala和Python的Spark編程模型

  • 了解使用Scala和Python的Spark SQL和DataFrames

  • 介紹使用R進行Spark編程

  • 使用Python進行Spark數據處理、圖表和繪圖

  • 使用Scala和Python進行Spark流處理

  • 介紹使用Spark MLlib進行機器學習

  • 開始使用Spark GraphX進行圖形處理

  • 將之前學到的知識結合起來,開發一個完整的Spark應用程序

關於作者


Rajanarayanan Thottuvaikkatumana(Raj)是一位經驗豐富的技術專家,在各個跨國公司擁有超過23年的軟件開發經驗。他曾在印度、新加坡和美國生活和工作,目前居住在英國。他的經驗包括架構設計和開發軟件應用程序。他曾在各種技術領域工作,包括主要數據庫、應用程序開發平台、Web技術和大數據技術。自2000年以來,他主要在Java相關技術上工作,並在Java和Scala中進行高度並發、高度分佈和高交易量的服務端編程。他曾參與過非常複雜的、高度分佈的和高交易量的系統開發。目前,他正在使用Scala構建基於Hadoop YARN的下一代數據處理平台和使用Spark的應用程序套件。


Raj擁有一個數學碩士學位、一個計算機信息系統碩士學位,並擁有ITIL和雲計算等多個認證。Raj是Packt出版的《Cassandra Design Patterns - Second Edition》的作者。


當他不忙於工作時,Raj喜歡聆聽古典音樂和觀看很多網球比賽。

目錄



  1. Spark基礎知識

  2. Spark編程模型

  3. Spark SQL

  4. 使用R進行Spark編程

  5. 使用Python進行Spark數據分析

  6. Spark流處理

  7. Spark機器學習

  8. Spark圖形處理

  9. 設計Spark應用程序