Large Scale Machine Learning with Spark

Md. Rezaul Karim, Md. Mahedi Kaysar

  • 出版商: Packt Publishing
  • 出版日期: 2016-10-27
  • 售價: $1,840
  • 貴賓價: 9.5$1,748
  • 語言: 英文
  • 頁數: 476
  • 裝訂: Paperback
  • ISBN: 1785888749
  • ISBN-13: 9781785888748
  • 相關分類: SparkMachine Learning
  • 海外代購書籍(需單獨結帳)
    無現貨庫存(No stock available)



Discover everything you need to build robust machine learning applications with Spark 2.0

About This Book

  • Get the most up-to-date book on the market that focuses on design, engineering, and scalable solutions in machine learning with Spark 2.0.0
  • Use Spark’s machine learning library in a big data environment
  • You will learn how to develop high-value applications at scale with ease and a develop a personalized design

Who This Book Is For

This book is for data science engineers and scientists who work with large and complex data sets. You should be familiar with the basics of machine learning concepts, statistics, and computational mathematics. Knowledge of Scala and Java is advisable.

What You Will Learn

  • Get solid theoretical understandings of ML algorithms
  • Configure Spark on cluster and cloud infrastructure to develop applications using Scala, Java, Python, and R
  • Scale up ML applications on large cluster or cloud infrastructures
  • Use Spark ML and MLlib to develop ML pipelines with recommendation system, classification, regression, clustering, sentiment analysis, and dimensionality reduction
  • Handle large texts for developing ML applications with strong focus on feature engineering
  • Use Spark Streaming to develop ML applications for real-time streaming
  • Tune ML models with cross-validation, hyperparameters tuning and train split
  • Enhance ML models to make them adaptable for new data in dynamic and incremental environments

In Detail

Data processing, implementing related algorithms, tuning, scaling up and finally deploying are some crucial steps in the process of optimising any application.

Spark is capable of handling large-scale batch and streaming data to figure out when to cache data in memory and processing them up to 100 times faster than Hadoop-based MapReduce. This means predictive analytics can be applied to streaming and batch to develop complete machine learning (ML) applications a lot quicker, making Spark an ideal candidate for large data-intensive applications.

This book focuses on design engineering and scalable solutions using ML with Spark. First, you will learn how to install Spark with all new features from the latest Spark 2.0 release. Moving on, you’ll explore important concepts such as advanced feature engineering with RDD and Datasets. After studying developing and deploying applications, you will see how to use external libraries with Spark.

In summary, you will be able to develop complete and personalised ML applications from data collections,model building, tuning, and scaling up to deploying on a cluster or the cloud.

Style and approach

This book takes a practical approach where all the topics explained are demonstrated with the help of real-world use cases.


發現一切您在 Spark 2.0 上建立強大機器學習應用所需的一切

- 獲取市場上最新的關於 Spark 2.0.0 的書籍,專注於機器學習的設計、工程和可擴展解決方案
- 在大數據環境中使用 Spark 的機器學習庫
- 學習如何輕鬆開發規模化的高價值應用程式並建立個性化設計

本書適合與大型和複雜數據集一起工作的數據科學工程師和科學家。您應該熟悉機器學習概念、統計學和計算數學的基礎知識。建議具備 Scala 和 Java 的知識。

- 獲得機器學習算法的堅實理論基礎
- 在叢集和雲基礎架構上配置 Spark,使用 Scala、Java、Python 和 R 開發應用程式
- 在大型叢集或雲基礎架構上擴展機器學習應用程式
- 使用 Spark ML 和 MLlib 開發具有推薦系統、分類、回歸、分群、情感分析和降維等功能的機器學習流程
- 處理大型文本以開發具有強調特徵工程的機器學習應用程式
- 使用 Spark Streaming 開發實時流機器學習應用程式
- 使用交叉驗證、超參數調整和訓練分割來調整機器學習模型
- 增強機器學習模型,使其適應動態和增量環境中的新數據


Spark 能夠處理大規模批量和流式數據,並且能夠在記憶體中緩存數據並處理速度比基於 Hadoop 的 MapReduce 快 100 倍。這意味著可以將預測分析應用於流式和批量數據,更快地開發完整的機器學習 (ML) 應用程式,使 Spark 成為大型數據密集型應用程式的理想選擇。

本書專注於使用 Spark 進行 ML 的設計工程和可擴展解決方案。首先,您將學習如何安裝具有最新 Spark 2.0 版本的所有新功能。接著,您將探索重要概念,例如使用 RDD 和 Datasets 進行高級特徵工程。在研究開發和部署應用程式後,您將了解如何使用外部庫與 Spark 一起使用。

總結來說,您將能夠從數據收集、模型構建、調整和擴展到在叢集或雲上部署完整且個性化的 ML 應用程式。