High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
暫譯: 高效能 Spark：擴展與優化 Apache Spark 的最佳實踐

Name: High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
Price: 2223 TWD
Availability: OnlineOnly
Author: Karau, Holden, Polak, Adi, Warren, Rachel
ISBN: 1098145852

Karau, Holden, Polak, Adi, Warren, Rachel

出版商: O'Reilly
出版日期: 2026-07-07
售價: $2,340
貴賓價: 9.5 折 $2,223
語言: 英文
頁數: 409
裝訂: Quality Paper - also called trade paper
ISBN: 1098145852
ISBN-13: 9781098145859
相關分類: Spark

海外代購書籍(需單獨結帳)

商品描述

Apache Spark is amazing when everything clicks. But if you haven't seen the performance improvements you expected or still don't feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau, Adi Polak, and Rachel Warren walk you through the secrets of the Spark code base and demonstrate performance optimizations that will help your data pipelines run faster, scale to larger datasets, and avoid costly antipatterns.

Ideal for data engineers, software engineers, data scientists, and system administrators, the second edition of High Performance Spark presents new use cases, code examples, and best practices for Spark 3.x and beyond. This book gives you a fresh perspective on this continually evolving framework and shows you how to work around bumps on your Spark and PySpark journey.

With this book, you'll learn how to:

Accelerate your ML workflows with integrations including PyTorch
Handle key skew and take advantage of Spark's new dynamic partitioning
Make your code reliable with scalable testing and validation techniques
Make Spark high performance
Deploy Spark on Kubernetes and similar environments
Take advantage of GPU acceleration with RAPIDS and resource profiles
Get your Spark jobs to run faster
Use Spark to productionize exploratory data science projects
Handle even larger datasets with Spark
Gain faster insights by reducing pipeline running times

商品描述(中文翻譯)

Apache Spark 在一切運行順利時是非常出色的。但如果您尚未看到預期的性能提升，或仍然對在生產環境中使用 Spark 感到不夠自信，那麼這本實用的書籍就是為您而寫的。作者 Holden Karau、Adi Polak 和 Rachel Warren 將帶您深入了解 Spark 代碼庫的秘密，並展示性能優化方法，幫助您的數據管道運行得更快，擴展到更大的數據集，並避免代價高昂的反模式。

這本《高效能 Spark》（High Performance Spark）第二版非常適合數據工程師、軟體工程師、數據科學家和系統管理員，提供了 Spark 3.x 及以後版本的新用例、代碼範例和最佳實踐。本書為您提供了對這個不斷演變的框架的新視角，並展示了如何在 Spark 和 PySpark 的旅程中克服障礙。

通過這本書，您將學會如何：
- 通過包括 PyTorch 的整合來加速您的機器學習工作流程
- 處理關鍵偏斜並利用 Spark 的新動態分區功能
- 使您的代碼可靠，並使用可擴展的測試和驗證技術
- 使 Spark 高效能
- 在 Kubernetes 和類似環境中部署 Spark
- 利用 RAPIDS 和資源配置文件實現 GPU 加速
- 使您的 Spark 任務運行得更快
- 使用 Spark 將探索性數據科學項目投入生產
- 使用 Spark 處理更大的數據集
- 通過減少管道運行時間來獲得更快的洞察力