Scala and Spark for Big Data Analytics
Md. Rezaul Karim, Sridhar Alla
- Learn Scala's sophisticated type system that combines Functional Programming and object-oriented concepts
- Work on a wide array of applications from simple batch jobs to stream processing and machine learning
- Explore the most common as well as some complex use-cases to perform large-scale data analysis with Spark
Scala has been observing a steady rise in adoption over the past few years, especially in the field of data science and analytics. Going hand in hand with Scala, is Apache Spark, which is built on Scala, and is widely used in the field of Analytics.
If you want to leverage the power of both, Scala and Spark, to make sense of Big Data, then this book is for you.
This book is divided into three parts. In the first part, it will introduce you to Scala programming, helping you understand its fundamentals and be able to program with Spark. It will then move on to introducing you to Spark and the design choices beneath it and show you how to perform data analysis with it. Finally to shake things up, the book moves onto Advanced Spark and teach you advanced topics, like monitoring, configuration, debugging, testing and finally deployment.
By the end of this book, you will be able to perform full stack data analysis with Spark and feel that no amount of data is too big.
What you will learn
- Understand the basics of Scala and explore Functional programming.
- Get familiar with Collections API, one of the most prominent features of the standard library.
- Work with RDDs, the basic abstractions behind Apache Spark.
- Use Spark for the analysis of structured and unstructured data and work with SparkSQL's APIs.
- Take advantage of Spark for the analysis of streaming data and explore interoperability with streaming software like Apache Kafka.
- Use common Machine Learning techniques like Dimensionality Reduction and One Hot Encoding and build a predictive model using Spark.
- Use Bayesian inference to build another kind of classification model and understand when the Decision Tree algorithm should be used.
- Build a Clustering model and use it to make predictions.
- Tune your application and use Spark Testing Base.
- Deploy a full Spark application on a cluster using Mesos.