Algorithmic Aspects of Parallel Data Processing (Foundations and Trends(r) in Databases)
暫譯: 平行資料處理的演算法面向（資料庫的基礎與趨勢）

Name: Algorithmic Aspects of Parallel Data Processing (Foundations and Trends(r) in Databases)
Price: 3221 TWD
Availability: OnlineOnly
Author: Paraschos Koutris, Semih Salihoglu, Dan Suciu
ISBN: 1680834061

Paraschos Koutris, Semih Salihoglu, Dan Suciu

出版商: Now Publishers Inc
出版日期: 2018-02-22
售價: $3,390
貴賓價: 9.5 折 $3,221
語言: 英文
頁數: 146
裝訂: Paperback
ISBN: 1680834061
ISBN-13: 9781680834062
相關分類: Spark

海外代購書籍(需單獨結帳)

商品描述

The last decade has seen a huge and growing interest in processing large data sets on large distributed clusters. This trend began with the MapReduce framework, and has been widely adopted by several other systems, including PigLatin, Hive, Scope, Dremmel, Spark and Myria to name a few. While the applications of such systems are diverse (for example, machine learning, data analytics), most involve relatively standard data processing tasks like identifying relevant data, cleaning, filtering, joining, grouping, transforming, extracting features, and evaluating results. This has generated great interest in the study of algorithms for data processing on large distributed clusters.

Algorithmic Aspects of Parallel Data Processing discusses recent algorithmic developments for distributed data processing. It uses a theoretical model of parallel processing called the Massively Parallel Computation (MPC) model, which is a simplification of the BSP model where the only cost is given by the amount of communication and the number of communication rounds. The survey studies several algorithms for multi-join queries, sorting, and matrix multiplication. It discusses their relationships and common techniques applied across the different data processing tasks.

商品描述(中文翻譯)

過去十年來，對於在大型分散式叢集上處理大型數據集的興趣日益增長。這一趨勢始於 MapReduce 框架，並被多個其他系統廣泛採用，包括 PigLatin、Hive、Scope、Dremmel、Spark 和 Myria 等。這些系統的應用範圍多樣（例如，機器學習、數據分析），但大多數涉及相對標準的數據處理任務，如識別相關數據、清理、過濾、聯接、分組、轉換、提取特徵和評估結果。這引發了對於在大型分散式叢集上進行數據處理算法研究的極大興趣。

《平行數據處理的算法方面》討論了分散式數據處理的最新算法發展。它使用一種稱為大規模平行計算（Massively Parallel Computation, MPC）模型的理論模型，這是一種對 BSP 模型的簡化，其中唯一的成本由通信量和通信輪次決定。該調查研究了多聯接查詢、排序和矩陣乘法的幾種算法，並討論了它們之間的關係以及在不同數據處理任務中應用的共同技術。