Modern Algorithms of Cluster Analysis (Studies in Big Data)

Slawomir Wierzchoń, Mieczyslaw Klopotek

  • 出版商: Springer
  • 出版日期: 2018-01-29
  • 售價: $7,750
  • 貴賓價: 9.5$7,363
  • 語言: 英文
  • 頁數: 421
  • 裝訂: Hardcover
  • ISBN: 3319693077
  • ISBN-13: 9783319693071
  • 相關分類: 大數據 Big-dataAlgorithms-data-structures
  • 海外代購書籍(需單獨結帳)

商品描述

This book provides the reader with a basic understanding of the formal concepts of the cluster, clustering, partition, cluster analysis etc.

 

The book explains feature-based, graph-based and spectral clustering methods and discusses their formal similarities and differences. Understanding the related formal concepts is particularly vital in the epoch of Big Data; due to the volume and characteristics of the data, it is no longer feasible to predominantly rely on merely viewing the data when facing a clustering problem.

 

Usually clustering involves choosing similar objects and grouping them together. To facilitate the choice of similarity measures for complex and big data, various measures of object similarity, based on quantitative (like numerical measurement results) and qualitative features (like text), as well as combinations of the two, are described, as well as graph-based similarity measures for (hyper) linked objects and measures for multilayered graphs. Numerous variants demonstrating how such similarity measures can be exploited when defining clustering cost functions are also presented.

 

In addition, the book provides an overview of approaches to handling large collections of objects in a reasonable time. In particular, it addresses grid-based methods, sampling methods, parallelization via Map-Reduce, usage of tree-structures, random projections and various heuristic approaches, especially those used for community detection.


商品描述(中文翻譯)

本書提供讀者對於集群、分群、分割、集群分析等形式概念的基本理解。

本書解釋了基於特徵、基於圖形和基於譜的集群方法,並討論了它們的形式上的相似性和差異性。在大數據時代,理解相關的形式概念尤為重要;由於數據的量和特性,單純依靠觀察數據來解決集群問題已不再可行。

通常,集群涉及選擇相似的對象並將它們分組在一起。為了方便對於複雜和大數據的相似度度量的選擇,本書描述了基於定量特徵(如數值測量結果)和定性特徵(如文本)以及兩者的組合的對象相似度度量,以及用於(超)連接對象的基於圖形的相似度度量和多層圖的度量。本書還介紹了許多變體,展示了在定義集群成本函數時如何利用這些相似度度量。

此外,本書概述了在合理時間內處理大量對象的方法。特別是,它涉及基於網格的方法、抽樣方法、通過Map-Reduce進行並行化、使用樹結構、隨機投影和各種啟發式方法,特別是用於社區檢測的方法。