Mastering Spark for Data Science
暫譯: 掌握 Spark 進行資料科學分析

Name: Mastering Spark for Data Science
Price: 2185 TWD
Availability: OnlineOnly
Author: Andrew Morgan, Antoine Amend, David George, Matthew Hallett
ISBN: 1785882147

Andrew Morgan, Antoine Amend, David George, Matthew Hallett

出版商: Packt Publishing
出版日期: 2017-03-30
售價: $2,300
貴賓價: 9.5 折 $2,185
語言: 英文
頁數: 560
裝訂: Paperback
ISBN: 1785882147
ISBN-13: 9781785882142
相關分類: Spark
相關翻譯: 精通 Spark 數據科學 (簡中版)

海外代購書籍(需單獨結帳)

前往其他有現貨版本↗️

商品描述

Unlock the complexities of lightning fast data science

About This Book

Develop and apply advanced analytical techniques with Spark
Learn how to tell a compelling story in data science using Spark's ecosystem
Explore data at a scale and work with cutting edge data science methods

Who This Book Is For

This book is for those who have beginner-level familiarity with the Spark architecture and data science applications, who are looking for a challenge and want to learn cutting edge techniques. This book assumes working knowledge of data science, common machine learning methods, and popular data science tools, and assumes you have previously run proof of concept studies and built prototypes.

What You Will Learn

Learn the design patterns that integrate Spark into with industrialized data science pipelines
Understand how commercial data scientists design scalable code and reusable code for data science services
Get a grasp of the new cutting edge data science methods so you can study trends and causality
Find out how to use Spark as a universal ingestion engine tool and as a web scraper
Practice the implementation of advanced topics in graph processing, such as community detection and contact chaining
Get to know the best practices when performing Extended Exploratory Data Analysis, commonly used in commercial data science teams
Grasp advanced Spark concepts, as well as solution design patterns and integration architectures
Demonstrate powerful data science pipelines
Get detailed guidance on how to run Spark in production

In Detail

The purpose of data science is to transform the world using data, and this goal is mainly achieved through disrupting and changing real processes in real industries. To operate at this level, you need to be able to build data science solutions of substance; ones that solve real problems, and that can run reliably enough for people to trust and act on. Spark has emerged as the big data platform of choice for data scientists.

This book deep dives into Spark to deliver production-grade data science solutions that are innovative, disruptive, and reliable enough to be trusted. We demonstrate the process through exploring the construction of a sophisticated global news analysis service that uses Spark to generate continuous geopolitical and current affairs insights. We use the core Spark APIs and take a deep-dive into advanced libraries including: Spark SQL, visual streaming, MLlib, and more.

We introduce advanced techniques and methods to help you build data science solutions, and show you how to construct commercial grade data products. Using a sequence of tutorials that deliver a working news intelligence service, we explain advanced Spark architectures, unveil sophisticated data science methods, demonstrate how to work with geographic data in Spark, and explain how to tune Spark algorithms so they scale linearly.