Scala and Spark for Big Data Analytics

Md. Rezaul Karim, Sridhar Alla

商品描述

Key Features

  • Learn Scala's sophisticated type system that combines Functional Programming and object-oriented concepts
  • Work on a wide array of applications from simple batch jobs to stream processing and machine learning
  • Explore the most common as well as some complex use-cases to perform large-scale data analysis with Spark

Book Description

Scala has been observing a steady rise in adoption over the past few years, especially in the field of data science and analytics. Going hand in hand with Scala, is Apache Spark, which is built on Scala, and is widely used in the field of Analytics.

If you want to leverage the power of both, Scala and Spark, to make sense of Big Data, then this book is for you.

This book is divided into three parts. In the first part, it will introduce you to Scala programming, helping you understand its fundamentals and be able to program with Spark. It will then move on to introducing you to Spark and the design choices beneath it and show you how to perform data analysis with it. Finally to shake things up, the book moves onto Advanced Spark and teach you advanced topics, like monitoring, configuration, debugging, testing and finally deployment.

By the end of this book, you will be able to perform full stack data analysis with Spark and feel that no amount of data is too big.

What you will learn

  • Understand the basics of Scala and explore Functional programming.
  • Get familiar with Collections API, one of the most prominent features of the standard library.
  • Work with RDDs, the basic abstractions behind Apache Spark.
  • Use Spark for the analysis of structured and unstructured data and work with SparkSQL's APIs.
  • Take advantage of Spark for the analysis of streaming data and explore interoperability with streaming software like Apache Kafka.
  • Use common Machine Learning techniques like Dimensionality Reduction and One Hot Encoding and build a predictive model using Spark.
  • Use Bayesian inference to build another kind of classification model and understand when the Decision Tree algorithm should be used.
  • Build a Clustering model and use it to make predictions.
  • Tune your application and use Spark Testing Base.
  • Deploy a full Spark application on a cluster using Mesos.

商品描述(中文翻譯)

主要特點



  • 學習Scala的複雜類型系統,結合了函數式編程和面向對象的概念

  • 從簡單的批處理作業到流處理和機器學習,應用於各種應用程式

  • 探索使用Spark進行大規模數據分析的最常見和一些複雜的用例

書籍描述


Scala在過去幾年中在數據科學和分析領域中的應用逐漸增加。與Scala緊密相連的是基於Scala開發的Apache Spark,在分析領域被廣泛使用。


如果您想要利用Scala和Spark的優勢來處理大數據,那麼這本書適合您。


本書分為三個部分。第一部分將介紹Scala編程,幫助您了解其基礎知識並能夠使用Spark進行編程。然後,它將介紹Spark和其設計選擇,並向您展示如何使用它進行數據分析。最後,本書將進入高級Spark,教授您高級主題,如監控、配置、調試、測試和部署。


通過閱讀本書,您將能夠使用Spark進行全棧數據分析,並且對於任何數據量都能應付自如。

您將學到什麼



  • 了解Scala的基礎知識並探索函數式編程

  • 熟悉集合API,這是標準庫中最重要的功能之一

  • 使用RDD,這是Apache Spark的基本抽象

  • 使用Spark進行結構化和非結構化數據分析,並使用SparkSQL的API

  • 利用Spark進行流數據分析,並探索與Apache Kafka等流軟件的互操作性

  • 使用常見的機器學習技術,如降維和獨熱編碼,並使用Spark構建預測模型

  • 使用貝葉斯推斷構建另一種分類模型,並了解何時應該使用決策樹算法

  • 構建聚類模型並使用它進行預測

  • 調優應用程序並使用Spark Testing Base

  • 使用Mesos在集群上部署完整的Spark應用程序