Frank Kane's Taming Big Data with Apache Spark and Python
暫譯: 弗蘭克·凱恩的《用 Apache Spark 和 Python 駕馭大數據》
Frank Kane
- 出版商: Packt Publishing
- 出版日期: 2017-06-30
- 售價: $1,610
- 貴賓價: 9.5 折 $1,530
- 語言: 英文
- 頁數: 296
- 裝訂: Paperback
- ISBN: 1787287947
- ISBN-13: 9781787287945
-
相關分類:
Spark
海外代購書籍(需單獨結帳)
買這商品的人也買了...
-
C++ 編程規範 (C++ Coding Standards: 101 Rules, Guidelines, and Best Practices)$580$458 -
Solar Energy Forecasting and Resource Assessment (Hardcover)$1,780$1,744 -
深入淺出 C#, 3/e (Head First C#, 3/e)$980$774 -
挑戰 PHP / MySQL 程式設計與超強專題特訓班, 3/e (適用PHP5~PHP6)$550$435 -
CentOS 7 建置、管理與伺服器架設實戰$580$452 -
$2,048Hadoop: The Definitive Guide, 4/e (Paperback) -
Simulation with Arena, 6/e (IE-Paperback)$1,200$1,176 -
Hadoop + Spark 大數據巨量分析與機器學習整合開發實戰$620$484 -
$414Python 資料分析與挖掘實戰 -
Effective Modern C++:提昇 C++11 與 C++14 技術的 42個具體作法 (中文版)(Effective Modern C++: 42 Specific Ways to Improve Your Use of C++11 and C++14)$580$458 -
Digital Signal Processing First, 2/e (DSP First)(IE-Paerback)$1,350$1,323 -
TensorFlow + Keras 深度學習人工智慧實務應用$590$460 -
Deep Learning with Python (Paperback)$1,760$1,672 -
實戰 ROS 機器人自作|使用 Raspberry Pi$520$411 -
Raspberry Pi 最佳入門與應用 (Python)(第二版)(附範例光碟)$430$387 -
Deep Learning Illustrated: A Visual, Interactive Guide to Artificial Intelligence (Paperback)$2,210$2,100 -
Deep Learning$900$855 -
物聯網原來這麼近:立即手動實作一個 (熱銷版)$550$468 -
物聯網實戰:使用樹莓派 /Arduino/ESP8266 NodeMCU/Python/Node-RED 打造安全監控系統$500$390 -
物聯網 Python 整合實戰 (舊名: 王者歸來:精通物聯網及Python)$890$757 -
機器學習的數學基礎 : AI、深度學習打底必讀$580$458 -
物聯網概論$480$432 -
深度學習的數學地圖 -- 用 Python 實作神經網路的數學模型 (附數學快查學習地圖)$580$458 -
電腦網路概論, 10/e$550$495 -
工業4.0 的物聯網智慧工廠應用與實作:使用 Arduino.Node-RED.MySQL.Node.js$500$199
商品描述
Key Features
- Understand how Spark can be distributed across computing clusters
- Develop and run Spark jobs efficiently using Python
- A hands-on tutorial by Frank Kane with over 15 real-world examples teaching you Big Data processing with Spark
Book Description
Frank Kane's Taming Big Data with Apache Spark and Python is your companion to learning Apache Spark in a hands-on manner. Frank will start you off by teaching you how to set up Spark on a single system or on a cluster, and you'll soon move on to analyzing large data sets using Spark RDD, and developing and running effective Spark jobs quickly using Python.
Apache Spark has emerged as the next big thing in the Big Data domain - quickly rising from an ascending technology to an established superstar in just a matter of years. Spark allows you to quickly extract actionable insights from large amounts of data, on a real-time basis, making it an essential tool in many modern businesses.
Frank has packed this book with over 15 interactive, fun-filled examples relevant to the real world, and he will empower you to understand the Spark ecosystem and implement production-grade real-time Spark projects with ease.
What you will learn
- Find out how you can identify Big Data problems as Spark problems
- Install and run Apache Spark on your computer or on a cluster
- Analyze large data sets across many CPUs using Spark's Resilient Distributed Datasets
- Implement machine learning on Spark using the MLlib library
- Process continuous streams of data in real time using the Spark streaming module
- Perform complex network analysis using Spark's GraphX library
- Use Amazon's Elastic MapReduce service to run your Spark jobs on a cluster
About the Author
My name is Frank Kane. I spent nine years at Amazon and IMDb, wrangling millions of customer ratings and customer transactions to produce things such as personalized recommendations for movies and products and "people who bought this also bought." I tell you, I wish we had Apache Spark back then, when I spent years trying to solve these problems there. I hold 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, I left to start my own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.
Table of Contents
- Getting Started with Spark
- Spark Basics and Simple Examples
- Advanced Examples of Spark Programs
- Running Spark on a Cluster
- SparkSQL, Dataframes and Datasets
- Other Spark Technologies and Libraries
- Where to Go From Here? - Learning More About Spark and Data Science
商品描述(中文翻譯)
#### 主要特點
- 了解 Spark 如何在計算叢集上進行分散式運算
- 使用 Python 高效開發和執行 Spark 任務
- Frank Kane 提供的實作教學,包含超過 15 個真實案例,教你如何使用 Spark 進行大數據處理
#### 書籍描述
Frank Kane 的《使用 Apache Spark 和 Python 駕馭大數據》是你學習 Apache Spark 的實作伴侶。Frank 將從教你如何在單一系統或叢集上設置 Spark 開始,然後你將很快學會使用 Spark RDD 分析大型數據集,並使用 Python 快速開發和執行有效的 Spark 任務。
Apache Spark 已經成為大數據領域的下一個重要技術——在短短幾年內,從一個新興技術迅速崛起為一個成熟的明星。Spark 使你能夠快速從大量數據中提取可行的見解,並且能夠實時進行,這使其成為許多現代企業中不可或缺的工具。
Frank 在這本書中提供了超過 15 個互動且充滿趣味的真實案例,幫助你理解 Spark 生態系統,並輕鬆實現生產級的實時 Spark 專案。
#### 你將學到什麼
- 瞭解如何將大數據問題識別為 Spark 問題
- 在你的電腦或叢集上安裝和運行 Apache Spark
- 使用 Spark 的彈性分散式數據集分析大型數據集,跨多個 CPU
- 使用 MLlib 庫在 Spark 上實現機器學習
- 使用 Spark Streaming 模組實時處理連續數據流
- 使用 Spark 的 GraphX 庫執行複雜的網絡分析
- 使用亞馬遜的 Elastic MapReduce 服務在叢集上運行你的 Spark 任務
#### 關於作者
我的名字是 **Frank Kane**。我在亞馬遜和 IMDb 工作了九年,處理數百萬的客戶評分和交易,產生個性化的電影和產品推薦,以及「購買此商品的人也購買了」的功能。我告訴你,我希望當時有 Apache Spark,因為我花了多年時間試圖解決這些問題。我在分散式計算、數據挖掘和機器學習領域擁有 17 項已授權專利。2012 年,我離開去創辦自己的成功公司 Sundog Software,專注於虛擬現實環境技術,以及教導他人有關大數據分析的知識。
#### 目錄
1. 開始使用 Spark
2. Spark 基礎與簡單範例
3. Spark 程式的進階範例
4. 在叢集上運行 Spark
5. SparkSQL、數據框和數據集
6. 其他 Spark 技術和庫
7. 接下來該怎麼辦? - 繼續學習 Spark 和數據科學
