Learning Spark: Lightning-Fast Data Analytics, 2/e (Paperback)
暫譯: 學習 Spark:閃電般快速的數據分析,第二版(平裝本)
Damji, Jules S., Wenig, Brooke, Das, Tathagata
- 出版商: O'Reilly
- 出版日期: 2020-08-25
- 定價: $2,640
- 售價: 9.0 折 $2,376
- 語言: 英文
- 頁數: 300
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1492050040
- ISBN-13: 9781492050049
-
相關分類:
Spark
-
相關翻譯:
Spark快速大數據分析 第2版 (簡中版)
立即出貨 (庫存 < 4)
買這商品的人也買了...
-
Spark 學習手冊 (Learning Spark: Lightning-Fast Big Data Analysis)$520$411 -
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems (Paperback)$1,995$1,890 -
$2,083Spark: The Definitive Guide: Big Data Processing Made Simple (Paperback) -
從零開始!邁向數據分析 SQL 資料庫語法入門$520$411 -
Designing with Data|善用數據幫你打造好設計 (Designing with Data: Improving the User Experience with A/B Testing)$580$458 -
領域驅動設計:軟體核心複雜度的解決方法 (Domain-Driven Design: Tackling Complexity in the Heart of Software)$680$530 -
SQL 達人的工作現場攻略筆記$580$458 -
優化 SQL|語法與資料庫的最佳化應用$450$383 -
設計思考全攻略:概念X流程X工具X團隊,史丹佛最受歡迎的商業設計課一次就上手$699$594 -
駭客自首:極惡網路攻擊的內幕技巧$780$616 -
精通機器學習|使用 Scikit-Learn , Keras 與 TensorFlow, 2/e (Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2/e)$1,200$948 -
架構模式|使用 Python (Architecture Patterns with Python: Enabling Test-Driven Development, Domain-Driven Design, and Event-Driven Microservices)$680$537 -
Python for DevOps|學習精準有效的自動化 (Python for Devops: Learn Ruthlessly Effective Automation)$780$616 -
Excel VBA 最強入門邁向辦公室自動化之路王者歸來 -- 上冊 (全彩印刷)$620$490 -
Mike Cohn 的使用者故事:敏捷軟體開發應用之道 (User Stories Applied : For Agile Software Development)$600$468 -
強健的 Python|撰寫潔淨且可維護的程式碼 (Robust Python: Write Clean and Maintainable Code)$680$537 -
商業智慧:從 Tableau 運作機制邁向大數據分析之路 (附光碟)$500$450 -
Web API 設計原則|API 與微服務傳遞價值之道 (Principles of Web API Design: Delivering Value with APIs and Microservices)$520$411 -
IT 工程師必需!Linux 快速入門實戰手冊 - 從命令列、系統設定到開發環境建置, 實體機、虛擬機、容器化、WSL、雲端平台全適用$630$498 -
數據網格|大規模提供資料驅動價值 (Data Mesh: Delivering Data-Driven Value at Scale)$680$537 -
Building Data Science Applications with FastAPI - Second Edition: Develop, manage, and deploy efficient machine learning applications with Python$1,800$1,710 -
實戰 Tableau 資料分析與視覺化分析$480$379 -
Docker 實戰 6堂課:56個實驗動手做,掌握 Linux 容器核心技術(iThome鐵人賽系列書)【軟精裝】$720$562 -
Power Automate 自動化大全:串接 Excel、ChatGPT、SQL 指令,打造報表處理、網路爬蟲、資料分析超高效流程$630$498 -
Jira 全方位應用:深入解析 Kanban × 超強外掛,掌握敏捷開發的核心工具 (iThome鐵人賽系列書)【軟精裝】$680$530
商品描述
Data is bigger, arrives faster, and comes in a variety of formats--and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark.
Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, you'll be able to:
- Learn Python, SQL, Scala, or Java high-level Structured APIs
- Understand Spark operations and SQL Engine
- Inspect, tune, and debug Spark operations with Spark configurations and Spark UI
- Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka
- Perform analytics on batch and streaming data using Structured Streaming
- Build reliable data pipelines with open source Delta Lake and Spark
- Develop machine learning pipelines with MLlib and productionize models using MLflow
商品描述(中文翻譯)
資料的規模更大、到達速度更快,並且以多種格式出現——所有這些都需要在大規模下進行處理,以便進行分析或機器學習。但是,如何有效地處理這些多樣化的工作負載呢?這就是 Apache Spark 的用武之地。
本書第二版已更新至 Spark 3.0,向資料工程師和資料科學家說明了 Spark 中結構和統一的重要性。具體而言,本書解釋了如何執行簡單和複雜的資料分析以及使用機器學習算法。通過逐步的指導、程式碼片段和筆記本,您將能夠:
- 學習 Python、SQL、Scala 或 Java 的高階結構化 API
- 理解 Spark 操作和 SQL 引擎
- 使用 Spark 配置和 Spark UI 檢查、調整和除錯 Spark 操作
- 連接到資料來源:JSON、Parquet、CSV、Avro、ORC、Hive、S3 或 Kafka
- 使用結構化流處理對批次和串流資料進行分析
- 使用開源 Delta Lake 和 Spark 建立可靠的資料管道
- 使用 MLlib 開發機器學習管道,並使用 MLflow 將模型投入生產
作者簡介
Jules S. Damji is a senior developer advocate at Databricks and an MLflow contributor. He is a hands-on developer with over 20 years of experience and has worked as a software engineer at leading companies such as Sun Microsystems, Netscape, @Home, Loudcloud/Opsware, Verisign, ProQuest, and Hortonworks, building large scale distributed systems. He holds a B.Sc. and an M.Sc. in computer science and an MA in political advocacy and communication from Oregon State University, Cal State, and Johns Hopkins University, respectively.
Brooke Wenig is a machine learning practice lead at Databricks. She leads a team of data scientists who develop large-scale machine learning pipelines for customers, as well as teaching courses on distributed machine learning best practices. Previously, she was a principal data science consultant at Databricks. She holds an M.S. in computer science from UCLA with a focus on distributed machine learning.
Tathagata Das is a staff software engineer at Databricks, an Apache Spark committer, and a member of the Apache Spark Project Management Committee (PMC). He is one of the original developers of Apache Spark, the lead developer of Spark Streaming (DStreams), and is currently one of the core developers of Structured Streaming and Delta Lake. Tathagata holds an M.S. in computer science from UC Berkeley.
Denny Lee is a staff developer advocate at Databricks who has been working with Apache Spark since 0.6. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premises and cloud environments. He also has an M.S. in biomedical informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise healthcare customers.
作者簡介(中文翻譯)
Jules S. Damji 是 Databricks 的資深開發者倡導者,也是 MLflow 的貢獻者。他是一位實務開發者,擁有超過 20 年的經驗,曾在 Sun Microsystems、Netscape、@Home、Loudcloud/Opsware、Verisign、ProQuest 和 Hortonworks 等領先公司擔任軟體工程師,負責構建大規模分散式系統。他擁有俄勒岡州立大學、加州州立大學和約翰霍普金斯大學的計算機科學學士和碩士學位,以及政治倡導與傳播的碩士學位。
Brooke Wenig 是 Databricks 的機器學習實踐負責人。她領導一支數據科學家團隊,為客戶開發大規模機器學習管道,並教授分散式機器學習最佳實踐的課程。她之前是 Databricks 的首席數據科學顧問。她擁有加州大學洛杉磯分校的計算機科學碩士學位,專注於分散式機器學習。
Tathagata Das 是 Databricks 的員工軟體工程師,Apache Spark 的提交者,以及 Apache Spark 專案管理委員會 (PMC) 的成員。他是 Apache Spark 的原始開發者之一,Spark Streaming (DStreams) 的首席開發者,目前是結構化流 (Structured Streaming) 和 Delta Lake 的核心開發者之一。Tathagata 擁有加州大學伯克利分校的計算機科學碩士學位。
Denny Lee 是 Databricks 的員工開發者倡導者,自 0.6 版本以來一直在使用 Apache Spark。他是一位實務的分散式系統和數據科學工程師,擁有豐富的經驗,開發互聯網規模的基礎設施、數據平台和預測分析系統,適用於本地和雲端環境。他還擁有俄勒岡健康與科學大學的生物醫學資訊學碩士學位,並為企業醫療客戶設計和實施了強大的數據解決方案。