Learn Pyspark: Build Python-Based Machine Learning and Deep Learning Models (Paperback)

Singh, Pramod

  • 出版商: Apress
  • 出版日期: 2019-09-07
  • 售價: $1,670
  • 貴賓價: 9.5$1,587
  • 語言: 英文
  • 頁數: 295
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1484249607
  • ISBN-13: 9781484249604
  • 相關分類: Python程式語言SparkMachine LearningDeepLearning
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

商品描述

Leverage machine and deep learning models to build applications on real-time data using PySpark. This book is perfect for those who want to learn to use this language to perform exploratory data analysis and solve an array of business challenges.
You'll start by reviewing PySpark fundamentals, such as Spark's core architecture, and see how to use PySpark for big data processing like data ingestion, cleaning, and transformations techniques. This is followed by building workflows for analyzing streaming data using PySpark and a comparison of various streaming platforms.
You'll then see how to schedule different spark jobs using Airflow with PySpark and book examine tuning machine and deep learning models for real-time predictions. This book concludes with a discussion on graph frames and performing network analysis using graph algorithms in PySpark. All the code presented in the book will be available in Python scripts on Github.
What You'll Learn

  • Develop pipelines for streaming data processing using PySpark
  • Build Machine Learning & Deep Learning models using PySpark latest offerings
  • Use graph analytics using PySpark
  • Create Sequence Embeddings from Text data

Who This Book is For

Data Scientists, machine learning and deep learning engineers who want to learn and use PySpark for real time analysis on streaming data.

商品描述(中文翻譯)

運用 PySpark 的機器學習和深度學習模型,建立即時數據應用程式。本書適合想要學習使用這種語言進行探索性數據分析並解決各種業務挑戰的讀者。

您將首先回顧 PySpark 的基礎知識,例如 Spark 的核心架構,並了解如何使用 PySpark 進行大數據處理,如數據輸入、清理和轉換技術。接著,您將建立用於分析流式數據的工作流程,並比較各種流式平台。

然後,您將學習如何使用 Airflow 和 PySpark 調度不同的 Spark 任務,並調優機器學習和深度學習模型以進行即時預測。本書最後討論了圖形框架和使用 PySpark 進行網絡分析的圖形算法。書中提供的所有代碼將在 Github 上以 Python 腳本的形式提供。

您將學到什麼

- 使用 PySpark 開發流式數據處理的流程
- 使用 PySpark 的最新功能建立機器學習和深度學習模型
- 使用 PySpark 進行圖形分析
- 從文本數據中創建序列嵌入

本書適合對象

數據科學家、機器學習和深度學習工程師,希望學習並使用 PySpark 進行流式數據的實時分析。

作者簡介

Pramod Singh is currently a Manager (Data Science) at Publicis Sapient and working as data science lead for a project with Mercedes Benz. He has spent the last nine years working on multiple Data projects at SapientRazorfish, Infosys & Tally and has used traditional to advanced machine learning and deep learning techniques in multiple projects using R, Python, Spark and Tensorflow. Pramod has also been a regular speaker at major conferences in India and abroad and is currently authoring a couple of books on Deep Learning and AI techniques. He regularly conducts Data Science meetups at SapientRazorfish and presents webinars on Machine Learning and Artificial Intelligence. He lives in Bangalore with his wife and 2-year-old son. In his spare time, he enjoys coding, reading and watching football.

作者簡介(中文翻譯)

Pramod Singh目前是Publicis Sapient的經理(資料科學),並在與Mercedes Benz合作的項目中擔任資料科學主管。他在SapientRazorfish、Infosys和Tally工作的過去九年中,參與了多個資料專案,並使用傳統到先進的機器學習和深度學習技術,使用R、Python、Spark和Tensorflow等工具。Pramod還經常在印度和國外的重要會議上擔任演講嘉賓,目前正在撰寫關於深度學習和人工智慧技術的幾本書籍。他定期在SapientRazorfish舉辦資料科學聚會,並在機器學習和人工智慧方面進行網絡研討會。他與妻子和兩歲的兒子一起居住在班加羅爾。在閒暇時間,他喜歡編程、閱讀和觀看足球比賽。