Advanced Analytics with Pyspark: Patterns for Learning from Data at Scale Using Python and Spark (Paperback)
暫譯: 使用 Pyspark 進行進階分析:利用 Python 和 Spark 從大規模數據中學習的模式 (平裝本)
Tandon, Akash, Ryza, Sandy, Laserson, Uri
買這商品的人也買了...
-
Arduino 官方正版 Genuino 101$1,700$1,700 -
網站可靠性工程|Google 的系統管理之道 (Site Reliability Engineering: How Google Runs Production Systems)(SRE)-*外觀稍有瑕疵,不介意者再下單$780$616 -
Raspberry Pi 3 Model B+ (UK製)$4,620$4,389 -
JVM Performance Engineering: Inside OpenJDK and the HotSpot Java Virtual Machine (Paperback)$1,980$1,940 -
晉昇軟體最高殿堂:Jenkins2 持續整合大師之路$600$474 -
$1,320Deep Learning with JavaScript: Neural Networks in Tensorflow.Js -
JavaScript 技術手冊$560$476 -
Building a Future-Proof Cloud Infrastructure: A Unified Architecture for Network, Security and Storage Services (Paperback)$1,998$1,958 -
$1,584Microservices Security in Action -
Java SE 14 技術手冊$680$537 -
網站可靠性工程工作手冊|導入SRE的實用方法 (The Site Reliability Workbook)$780$616 -
$2,205Parallel and High Performance Computing (Paperback) -
$2,070Multithreaded JavaScript: Concurrency Beyond the Event Loop -
Structure and Interpretation of Computer Programs: JavaScript Edition (Paperback)$2,680$2,546 -
建構機器學習管道|運用 TensorFlow 實現模型生命週期自動化 (Building Machine Learning Pipelines: Automating Model Life Cycles with Tensorflow)$580$458 -
$2,052Mastering API Architecture: Design, Operate, and Evolve Api-Based Systems (Paperback) -
OAuth 2.0 從入門到實戰:利用驗證和授權守護 API 的安全$600$468 -
$2,233Functional and Concurrent Programming: Core Concepts and Features -
$1,767Functional Design: Principles, Patterns, and Practices (Paperback) -
OpenTelemetry 入門指南:建立全面可觀測性架構(iThome鐵人賽系列書)【軟精裝】$750$585 -
Learning Systems Thinking: Essential Nonlinear Skills and Practices for Software Professionals (Paperback)$190$180 -
Collaborative Software Design: How to Facilitate Domain Modeling Decisions$1,750$1,663 -
內行人才知道的機器學習系統設計面試指南 (Machine Learning System Design Interview)$680$537 -
Mastering Opentelemetry and Observability: Enhancing Application and Infrastructure Performance and Avoiding Outages$2,100$1,995 -
Full Stack JavaScript Strategies: The Hidden Parts Every Mid-Level Developer Needs to Know (Paperback)$2,062$1,953
商品描述
The amount of data being generated today is staggering--and growing. Apache Spark has emerged as the de facto tool to analyze big data and is now a critical part of the data science toolbox. Updated for Spark 3.0, this practical guide brings together Spark, statistical methods, and real-world datasets to teach you how to approach analytics problems using PySpark, Spark's Python API, and other best practices in Spark programming.
Data scientists Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills offer an introduction to the Spark ecosystem, then dive into patterns that apply common techniques--including classification, clustering, collaborative filtering, and anomaly detection--to fields such as genomics, security, and finance. This updated edition also covers NLP and image processing.
If you have a basic understanding of machine learning and statistics and you program in Python, this book will get you started with large-scale data analysis.
- Familiarize yourself with Spark's programming model and ecosystem
- Learn general approaches in data science
- Examine complete implementations that analyze large public datasets
- Discover which machine learning tools make sense for particular problems
- Explore code that can be adapted to many uses
商品描述(中文翻譯)
當今生成的數據量驚人且持續增長。Apache Spark 已成為分析大數據的事實標準工具,並且現在是數據科學工具箱中的關鍵部分。本書針對 Spark 3.0 進行了更新,這本實用指南將 Spark、統計方法和現實世界數據集結合在一起,教您如何使用 PySpark(Spark 的 Python API)和其他 Spark 編程最佳實踐來解決分析問題。
數據科學家 Akash Tandon、Sandy Ryza、Uri Laserson、Sean Owen 和 Josh Wills 介紹了 Spark 生態系統,然後深入探討應用常見技術的模式,包括分類、聚類、協同過濾和異常檢測,這些技術應用於基因組學、安全性和金融等領域。本更新版還涵蓋了自然語言處理(NLP)和圖像處理。
如果您對機器學習和統計有基本了解,並且使用 Python 編程,這本書將幫助您開始進行大規模數據分析。
- 熟悉 Spark 的編程模型和生態系統
- 學習數據科學中的一般方法
- 檢查分析大型公共數據集的完整實現
- 發現哪些機器學習工具適合特定問題
- 探索可以適應多種用途的代碼
作者簡介
Akash Tandon is an independent consultant and experienced full-stack data engineer. Previously, he was a senior data engineer at Atlan, where he built software for enterprise data science teams. In another life, he had worked on data science projects for governments, and built risk assessment tools at a FinTech startup. As a student, he wrote open source software with the R project for statistical computing and Google. In his free time, he researches things for no good reason.
Sandy Ryza is software engineer at Elementl. Previously, he developed algorithms for public transit at Remix and was a senior data scientist at Cloudera and Clover Health. He is an Apache Spark committer, Apache Hadoop PMC member, and founder of the Time Series for Spark project.
Uri Laserson is founder & CTO of Patch Biosciences. Previously, he worked on big data and genomics at Cloudera.
Sean Owen is a principal solutions architect focusing on machine learning and data science at Databricks. He is an Apache Spark committer and PMC member, and co-author Advanced Analytics with Spark. Previously, he was director of Data Science at Cloudera and an engineer at Google.
Josh Wills is an independent data science and engineering consultant, the former head of data engineering at Slack and data science at Cloudera, and wrote a tweet about data scientists once.
作者簡介(中文翻譯)
Akash Tandon 是一位獨立顧問及經驗豐富的全端數據工程師。之前,他曾擔任 Atlan 的高級數據工程師,為企業數據科學團隊開發軟體。在另一段人生中,他曾為政府從事數據科學專案,並在一家金融科技初創公司建立風險評估工具。作為學生,他曾與 R 專案為統計計算及 Google 一起撰寫開源軟體。在空閒時間,他會無緣無故地研究各種事物。
Sandy Ryza 是 Elementl 的軟體工程師。之前,他在 Remix 開發公共交通的演算法,並曾擔任 Cloudera 和 Clover Health 的高級數據科學家。他是 Apache Spark 的提交者、Apache Hadoop PMC 成員,並創立了 Time Series for Spark 專案。
Uri Laserson 是 Patch Biosciences 的創辦人及首席技術官。之前,他在 Cloudera 從事大數據和基因組學的工作。
Sean Owen 是 Databricks 專注於機器學習和數據科學的首席解決方案架構師。他是 Apache Spark 的提交者和 PMC 成員,並共同撰寫了《Advanced Analytics with Spark》。之前,他曾擔任 Cloudera 的數據科學總監及 Google 的工程師。
Josh Wills 是一位獨立的數據科學和工程顧問,曾擔任 Slack 的數據工程負責人及 Cloudera 的數據科學負責人,並曾發過一則關於數據科學家的推文。