Machine Learning with Spark Second Edition

Rajdeep Dua, Manpreet Singh Ghotra, Nick Pentreath

商品描述

Key Features

  • Get to the grips with the latest version of Apache Spark
  • Utilize Spark's machine learning library to implement predictive analytics
  • Leverage Spark's powerful tools to load, analyze, clean, and transform your data

Book Description

Spark ML is the machine learning module of Spark. It uses in-memory RDDs to process machine learning models faster for clustering, classification, and regression.

This book will teach you about popular machine learning algorithms and their implementation. You will learn how various machine learning concepts are implemented in the context of Spark ML. You will start by installing Spark in a single and multinode cluster. Next you'll see how to execute Scala and Python based programs for Spark ML. Then we will take a few datasets and go deeper into clustering, classification, and regression. Toward the end, we will also cover text processing using Spark ML.

Once you have learned the concepts, they can be applied to implement algorithms in either green-field implementations or to migrate existing systems to this new platform. You can migrate from Mahout or Scikit to use Spark ML.

What you will learn

  • Get hands-on with the latest version of Spark ML
  • Create your first Spark program with Scala and Python
  • Set up and configure a development environment for Spark on your own computer, as well as on Amazon EC2
  • Access public machine learning datasets and use Spark to load, process, clean, and transform data
  • Use Spark's machine learning library to implement programs by utilizing well-known machine learning models
  • Deal with large-scale text data, including feature extraction and using text data as input to your machine learning models
  • Write Spark functions to evaluate the performance of your machine learning models

商品描述(中文翻譯)

主要特點



  • 熟悉最新版本的Apache Spark

  • 利用Spark的機器學習庫實現預測分析

  • 利用Spark強大的工具來載入、分析、清理和轉換數據

書籍描述


Spark ML是Spark的機器學習模塊。它使用內存RDD來更快地處理聚類、分類和回歸的機器學習模型。


本書將教你有關流行的機器學習算法及其實現的知識。你將學習如何在Spark ML的上下文中實現各種機器學習概念。首先,你將安裝單節點和多節點集群中的Spark。接下來,你將看到如何執行基於Scala和Python的Spark ML程序。然後,我們將選取一些數據集,深入研究聚類、分類和回歸。最後,我們還將介紹使用Spark ML進行文本處理。


一旦你學會了這些概念,你可以將它們應用於從頭開始的實現,或者將現有系統遷移到這個新平台上。你可以從Mahout或Scikit遷移到使用Spark ML。

你將學到什麼



  • 親自體驗最新版本的Spark ML

  • 使用Scala和Python創建你的第一個Spark程序

  • 在自己的計算機上以及在Amazon EC2上設置和配置Spark的開發環境

  • 訪問公共機器學習數據集,並使用Spark來載入、處理、清理和轉換數據

  • 使用Spark的機器學習庫來實現程序,利用知名的機器學習模型

  • 處理大規模文本數據,包括特徵提取和將文本數據用作機器學習模型的輸入

  • 編寫Spark函數來評估你的機器學習模型的性能