Data-Intensive Text Processing with MapReduce (Paperback)

Jimmy Lin, Chris Dyer

  • 出版商: Morgan & Claypool
  • 出版日期: 2010-04-30
  • 售價: $1,570
  • 貴賓價: 9.5$1,492
  • 語言: 英文
  • 頁數: 178
  • 裝訂: Paperback
  • ISBN: 1608453421
  • ISBN-13: 9781608453429
  • 相關分類: 分散式架構
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

商品描述

Our world is being revolutionized by data-driven methods: access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications. Processing the enormous quantities of data necessary for these advances requires large clusters, making distributed computing paradigms more crucial than ever. MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. The programming model provides an easy-to-understand abstraction for designing scalable algorithms, while the execution framework transparently handles many system-level details, ranging from scheduling to synchronization to fault tolerance. This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning. We introduce the notion of MapReduce design patterns, which represent general reusable solutions to commonly occurring problems across a variety of problem domains. This book not only intends to help the reader "think in MapReduce", but also discusses limitations of the programming model as well. Table of Contents: Introduction / MapReduce Basics / MapReduce Algorithm Design / Inverted Indexing for Text Retrieval / Graph Algorithms / EM Algorithms for Text Processing / Closing Remarks

商品描述(中文翻譯)

我們的世界正在被數據驅動的方法所革命化:大量數據的存取產生了新的洞察力,並在商業、科學和計算應用領域開啟了令人興奮的新機遇。處理這些進步所需的大量數據需要大型集群,使得分散式計算範式變得比以往更加重要。MapReduce是一種在大規模數據集群上表達分散式計算的編程模型,也是一種用於大規模數據處理的執行框架,運行在普通服務器集群上。該編程模型提供了一種易於理解的抽象,用於設計可擴展的算法,而執行框架則透明地處理許多系統級細節,從調度到同步到容錯。本書專注於MapReduce算法設計,重點介紹自然語言處理、信息檢索和機器學習中常見的文本處理算法。我們介紹了MapReduce設計模式的概念,這些模式代表了在各種問題領域中常見問題的通用可重用解決方案。本書不僅旨在幫助讀者“以MapReduce方式思考”,還討論了編程模型的限制。目錄:引言/MapReduce基礎知識/MapReduce算法設計/用於文本檢索的倒排索引/圖算法/用於文本處理的EM算法/結語