Scala Data Analysis Cookbook

Arun Manivannan

  • 出版商: Packt Publishing
  • 出版日期: 2015-10-30
  • 售價: $1,850
  • 貴賓價: 9.5$1,758
  • 語言: 英文
  • 頁數: 254
  • 裝訂: Paperback
  • ISBN: 1784396745
  • ISBN-13: 9781784396749
  • 相關分類: JVM 語言Data Science
  • 下單後立即進貨 (約3~4週)

商品描述

Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes

About This Book

  • Implement Scala in your data analysis using features from Spark, Breeze, and Zeppelin
  • Scale up your data anlytics infrastructure with practical recipes for Scala machine learning
  • Recipes for every stage of the data analysis process, from reading and collecting data to distributed analytics

Who This Book Is For

This book shows data scientists and analysts how to leverage their existing knowledge of Scala for quality and scalable data analysis.

What You Will Learn

  • Familiarize and set up the Breeze and Spark libraries and use data structures
  • Import data from a host of possible sources and create dataframes from CSV
  • Clean, validate and transform data using Scala to pre-process numerical and string data
  • Integrate quintessential machine learning algorithms using Scala stack
  • Bundle and scale up Spark jobs by deploying them into a variety of cluster managers
  • Run streaming and graph analytics in Spark to visualize data, enabling exploratory analysis

In Detail

This book will introduce you to the most popular Scala tools, libraries, and frameworks through practical recipes around loading, manipulating, and preparing your data. It will also help you explore and make sense of your data using stunning and insightfulvisualizations, and machine learning toolkits.

Starting with introductory recipes on utilizing the Breeze and Spark libraries, get to grips withhow to import data from a host of possible sources and how to pre-process numerical, string, and date data. Next, you'll get an understanding of concepts that will help you visualize data using the Apache Zeppelin and Bokeh bindings in Scala, enabling exploratory data analysis. iscover how to program quintessential machine learning algorithms using Spark ML library. Work through steps to scale your machine learning models and deploy them into a standalone cluster, EC2, YARN, and Mesos. Finally dip into the powerful options presented by Spark Streaming, and machine learning for streaming data, as well as utilizing Spark GraphX.

Style and approach

This book contains a rich set of recipes that covers the full spectrum of interesting data analysis tasks and will help you revolutionize your data analysis skills using Scala and Spark.

商品描述(中文翻譯)

透過超過100個Scala實作範例,探索數據分析、視覺化和機器學習的世界。

關於本書:
- 使用Spark、Breeze和Zeppelin的功能,在數據分析中實作Scala。
- 透過Scala機器學習的實用範例,擴展你的數據分析基礎設施。
- 從讀取和收集數據到分散式分析的每個階段都有相關範例。

本書適合對Scala有基礎認識的數據科學家和分析師,幫助他們進行高品質且可擴展的數據分析。

你將學到:
- 熟悉並設置Breeze和Spark庫,並使用數據結構。
- 從多種可能的來源導入數據,並從CSV創建數據框。
- 使用Scala清理、驗證和轉換數據,以預處理數值和字符串數據。
- 使用Scala堆疊集成基本的機器學習算法。
- 通過部署到各種集群管理器,打包和擴展Spark作業。
- 在Spark中運行流式和圖形分析,以視覺化數據,實現探索性分析。

本書將通過實用範例介紹最受歡迎的Scala工具、庫和框架,幫助你載入、操作和準備數據。同時,它還將幫助你使用令人驚嘆且有洞察力的視覺化和機器學習工具包,探索和理解數據。

從使用Breeze和Spark庫的入門範例開始,掌握從多種可能的來源導入數據以及預處理數值、字符串和日期數據的技巧。接下來,你將了解使用Scala中的Apache Zeppelin和Bokeh綁定來視覺化數據的概念,實現探索性數據分析。探索如何使用Spark ML庫編寫基本的機器學習算法。透過步驟來擴展你的機器學習模型並將其部署到獨立集群、EC2、YARN和Mesos。最後,深入研究Spark Streaming和流式數據的機器學習,以及利用Spark GraphX的強大選項。

本書提供了一系列豐富的範例,涵蓋了各種有趣的數據分析任務,將幫助你使用Scala和Spark革新你的數據分析技能。