Guide to High Performance Distributed Computing: Case Studies with Hadoop, Scalding and Spark (Computer Communications and Networks)

K.G. Srinivasa, Anil Kumar Muppalla

商品描述

This timely text/reference describes the development and implementation of large-scale distributed processing systems using open source tools and technologies. Comprehensive in scope, the book presents state-of-the-art material on building high performance distributed computing systems, providing practical guidance and best practices as well as describing theoretical software frameworks. Features: describes the fundamentals of building scalable software systems for large-scale data processing in the new paradigm of high performance distributed computing; presents an overview of the Hadoop ecosystem, followed by step-by-step instruction on its installation, programming and execution; Reviews the basics of Spark, including resilient distributed datasets, and examines Hadoop streaming and working with Scalding; Provides detailed case studies on approaches to clustering, data classification and regression analysis; Explains the process of creating a working recommender system using Scalding and Spark.

商品描述(中文翻譯)

這本及時的參考書描述了使用開源工具和技術開發和實施大規模分散式處理系統的過程。該書的範圍廣泛,提供了關於構建高性能分散式計算系統的最新資料,並提供實用指導和最佳實踐,同時描述了理論軟體框架。特點包括:描述了在高性能分散式計算的新範式下構建可擴展軟體系統的基礎知識;概述了Hadoop生態系統,並提供了其安裝、編程和執行的逐步指導;回顧了Spark的基礎知識,包括具有容錯能力的分散式數據集,並探討了Hadoop流式處理和使用Scalding的方法;提供了有關聚類、數據分類和回歸分析方法的詳細案例研究;解釋了使用Scalding和Spark創建工作推薦系統的過程。