Text Analysis Pipelines: Towards Ad-hoc Large-Scale Text Mining (Lecture Notes in Computer Science)

Henning Wachsmuth

  • 出版商: Springer
  • 出版日期: 2015-12-04
  • 售價: $2,370
  • 貴賓價: 9.5$2,252
  • 語言: 英文
  • 頁數: 302
  • 裝訂: Paperback
  • ISBN: 3319257404
  • ISBN-13: 9783319257402
  • 相關分類: Text-miningComputer-Science
  • 海外代購書籍(需單獨結帳)

商品描述

This monograph proposes a comprehensive and fully automatic approach to designing text analysis pipelines for arbitrary information needs that are optimal in terms of run-time efficiency and that robustly mine relevant information from text of any kind. Based on state-of-the-art techniques from machine learning and other areas of artificial intelligence, novel pipeline construction and execution algorithms are developed and implemented in prototypical software. Formal analyses of the algorithms and extensive empirical experiments underline that the proposed approach represents an essential step towards the ad-hoc use of text mining in web search and big data analytics.
Both web search and big data analytics aim to fulfill peoples’ needs for information in an adhoc manner. The information sought for is often hidden in large amounts of natural language text. Instead of simply returning links to potentially relevant texts, leading search and analytics engines have started to directly mine relevant information from the texts. To this end, they execute text analysis pipelines that may consist of several complex information-extraction and text-classification stages. Due to practical requirements of efficiency and robustness, however, the use of text mining has so far been limited to anticipated information needs that can be fulfilled with rather simple, manually constructed pipelines.


商品描述(中文翻譯)

本論文提出了一種全面且完全自動的方法,用於設計針對任意信息需求的文本分析流程,該方法在運行時效率和從任何類型的文本中穩健地挖掘相關信息方面是最佳的。基於機器學習和其他人工智能領域的最新技術,我們開發並實現了新的流程構建和執行算法,並提供了原型軟件。通過對算法的形式分析和廣泛的實驗,我們證明了所提出的方法是實現文本挖掘在網絡搜索和大數據分析中即時使用的重要一步。

網絡搜索和大數據分析都旨在以即時方式滿足人們對信息的需求。所尋求的信息通常隱藏在大量的自然語言文本中。領先的搜索和分析引擎已經開始直接從文本中挖掘相關信息,而不僅僅返回潛在相關文本的鏈接。為此,它們執行文本分析流程,可能包含多個複雜的信息提取和文本分類階段。然而,由於效率和穩健性的實際要求,迄今為止,文本挖掘的使用僅限於可以使用相對簡單、手動構建的預期信息需求。