Sams Teach Yourself Apache Spark in 24 Hours (Paperback)

Jeffrey Aven

  • 出版商: SAMS
  • 出版日期: 2016-08-17
  • 定價: $1,580
  • 售價: 9.0$1,422
  • 語言: 英文
  • 頁數: 592
  • 裝訂: Paperback
  • ISBN: 0672338513
  • ISBN-13: 9780672338519
  • 相關分類: Spark
  • 立即出貨 (庫存 < 3)

買這商品的人也買了...

商品描述

Apache Spark is a fast, scalable, and flexible open source distributed processing engine for big data systems and is one of the most active open source big data projects to date. In just 24 lessons of one hour or less, Sams Teach Yourself Apache Spark in 24 Hours helps you build practical Big Data solutions that leverage Spark’s amazing speed, scalability, simplicity, and versatility.

This book’s straightforward, step-by-step approach shows you how to deploy, program, optimize, manage, integrate, and extend Spark–now, and for years to come. You’ll discover how to create powerful solutions encompassing cloud computing, real-time stream processing, machine learning, and more. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success.

Whether you are a data analyst, data engineer, data scientist, or data steward, learning Spark will help you to advance your career or embark on a new career in the booming area of Big Data.

Learn how to
• Discover what Apache Spark does and how it fits into the Big Data landscape
• Deploy and run Spark locally or in the cloud
• Interact with Spark from the shell
• Make the most of the Spark Cluster Architecture
• Develop Spark applications with Scala and functional Python
• Program with the Spark API, including transformations and actions
• Apply practical data engineering/analysis approaches designed for Spark
• Use Resilient Distributed Datasets (RDDs) for caching, persistence, and output
• Optimize Spark solution performance
• Use Spark with SQL (via Spark SQL) and with NoSQL (via Cassandra)
• Leverage cutting-edge functional programming techniques
• Extend Spark with streaming, R, and Sparkling Water
• Start building Spark-based machine learning and graph-processing applications
• Explore advanced messaging technologies, including Kafka
• Preview and prepare for Spark’s next generation of innovations

Instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Spark to solve a wide spectrum of Big Data problems.

商品描述(中文翻譯)

Apache Spark 是一個快速、可擴展和靈活的開源分散式處理引擎,用於大數據系統,並且是迄今為止最活躍的開源大數據項目之一。在《Sams Teach Yourself Apache Spark in 24 Hours》這本書中,您將在不到24個小時的課程中學習如何構建實用的大數據解決方案,並利用Spark的驚人速度、可擴展性、簡單性和多功能性。

這本書採用直觀、逐步的方法,向您展示如何部署、編程、優化、管理、整合和擴展Spark,現在以及未來的幾年內。您將發現如何創建強大的解決方案,涵蓋雲計算、實時流處理、機器學習等領域。每一課都建立在您已經學到的基礎上,為您在現實世界中取得成功打下堅實的基礎。

無論您是數據分析師、數據工程師、數據科學家還是數據管理員,學習Spark都將幫助您在蓬勃發展的大數據領域中推進您的職業生涯或開啟新的職業生涯。

學習如何:
• 了解Apache Spark的功能以及它在大數據領域中的地位
• 在本地或雲端部署和運行Spark
• 通過Shell與Spark互動
• 充分利用Spark集群架構
• 使用Scala和功能性Python開發Spark應用程序
• 使用Spark API進行編程,包括轉換和操作
• 應用針對Spark設計的實用數據工程/分析方法
• 使用Resilient Distributed Datasets(RDD)進行緩存、持久化和輸出
• 優化Spark解決方案的性能
• 使用Spark SQL(通過Spark SQL)和NoSQL(通過Cassandra)使用Spark
• 利用尖端的函數式編程技術
• 通過流式處理、R和Sparkling Water擴展Spark
• 開始構建基於Spark的機器學習和圖形處理應用程序
• 探索先進的消息傳遞技術,包括Kafka
• 預覽並準備迎接Spark的下一代創新

本書的指導將引導您解決常見問題、困難和任務;問答、測驗和練習將建立和測試您的知識;"你知道嗎?"提示提供內部建議和捷徑;"注意!"警示幫助您避免陷阱。完成閱讀後,您將能夠自如地使用Apache Spark解決各種大數據問題。