Tapping into Unstructured Data: Integrating Unstructured Data and Textual Analytics into Business Intelligence

William H. Inmon, Anthony Nesavich

買這商品的人也買了...

商品描述

“The authors, the best minds on the topic, are breaking new ground. They show how every organization can realize the benefits of a system that can search and present complex ideas or data from what has been a mostly untapped source of raw data.”

--Randy Chalfant, CTO, Sun Microsystems

 

The Definitive Guide to Unstructured Data Management and Analysis--From the World’s Leading Information Management Expert

A wealth of invaluable information exists in unstructured textual form, but organizations have found it difficult or impossible to access and utilize it. This is changing rapidly: new approaches finally make it possible to glean useful knowledge from virtually any collection of unstructured data.

 

William H. Inmon--the father of data warehousing--and Anthony Nesavich introduce the next data revolution: unstructured data management. Inmon and Nesavich cover all you need to know to make unstructured data work for your organization. You’ll learn how to bring it into your existing structured data environment, leverage existing analytical infrastructure, and implement textual analytic processing technologies to solve new problems and uncover new opportunities. Inmon and Nesavich introduce breakthrough techniques covered in no other book--including the powerful role of textual integration, new ways to integrate textual data into data warehouses, and new SQL techniques for reading and analyzing text. They also present five chapter-length, real-world case studies--demonstrating unstructured data at work in medical research, insurance, chemical manufacturing, contracting, and beyond.

 

This book will be indispensable to every business and technical professional trying to make sense of a large body of unstructured text: managers, database designers, data modelers, DBAs, researchers, and end users alike.

 

Coverage includes

  • What unstructured data is, and how it differs from structured data
  • First generation technology for handling unstructured data, from search engines to ECM--and its limitations
  • Integrating text so it can be analyzed with a common, colloquial vocabulary: integration engines, ontologies, glossaries, and taxonomies
  • Processing semistructured data: uncovering patterns, words, identifiers, and conflicts
  • Novel processing opportunities that arise when text is freed from context
  • Architecture and unstructured data: Data Warehousing 2.0
  • Building unstructured relational databases and linking them to structured data
  • Visualizations and Self-Organizing Maps (SOMs), including Compudigm and Raptor solutions
  • Capturing knowledge from spreadsheet data and email
  • Implementing and managing metadata: data models, data quality, and more

William H. Inmon is founder, president, and CTO of Inmon Data Systems. He is the father of the data warehouse concept, the corporate information factory, and the government information factory. Inmon has written 47 books on data warehouse, database, and information technology management; as well as more than 750 articles for trade journals such as Data Management Review, Byte, Datamation, and ComputerWorld. His b-eye-network.com newsletter currently reaches 55,000 people.

Anthony Nesavich worked at Inmon Data Systems, where he developed multiple reports that successfully query unstructured data.

 

Preface xvii

1          Unstructured Textual Data in the Organization 1

2          The Environments of Structured Data and Unstructured Data 15

3          First Generation Textual Analytics 33

4          Integrating Unstructured Text into the Structured Environment 47

5          Semistructured Data 73

6          Architecture and Textual Analytics 83

7          The Unstructured Database 95

8          Analyzing a Combination of Unstructured Data and Structured Data 113

9          Analyzing Text Through Visualization 127

10        Spreadsheets and Email 135

11        Metadata in Unstructured Data 147

12        A Methodology for Textual Analytics 163

13        Merging Unstructured Databases into the Data Warehouse 175

14        Using SQL to Analyze Text 185

15        Case Study--Textual Analytics in Medical Research 195

16        Case Study--A Database for Harmful Chemicals 203

17        Case Study--Managing Contracts Through an Unstructured Database 209

18        Case Study--Creating a Corporate Taxonomy (Glossary) 215

19        Case Study--Insurance Claims 219

Glossary 227

Index 233

 

商品描述(中文翻譯)

「這本書的作者是該領域最頂尖的專家,他們正在開創新的領域。他們展示了每個組織如何實現一個能夠從一個大部分未開發的原始數據來源中搜索和呈現複雜想法或數據的系統的好處。」
- Randy Chalfant, Sun Microsystems 首席技術官

《非結構化數據管理和分析的權威指南——來自世界領先的信息管理專家》
非結構化的文本形式中存在著大量寶貴的信息,但組織發現很難或不可能訪問和利用它。這種情況正在迅速改變:新的方法終於使從幾乎任何非結構化數據集合中獲取有用知識成為可能。

William H. Inmon(數據倉庫之父)和Anthony Nesavich介紹了下一個數據革命:非結構化數據管理。Inmon和Nesavich涵蓋了您需要了解的所有內容,以使非結構化數據為您的組織工作。您將學習如何將其納入現有的結構化數據環境中,利用現有的分析基礎設施,並實施文本分析處理技術來解決新問題並發現新機會。Inmon和Nesavich介紹了其他書籍中沒有涵蓋的突破性技術,包括文本整合的強大作用,將文本數據整合到數據倉庫中的新方法,以及用於讀取和分析文本的新SQL技術。他們還提供了五個章節長度的真實案例研究,展示了非結構化數據在醫學研究、保險、化學製造、合同等領域的應用。

這本書對於每一位試圖理解大量非結構化文本的商業和技術專業人士都是必不可少的:經理、數據庫設計師、數據建模師、數據庫管理員、研究人員和最終用戶。

內容包括:
- 什麼是非結構化數據,以及它與結構化數據的區別
- 處理非結構化數據的第一代技術,從搜索引擎到企業內容管理系統(ECM)及其限制
- 整合文本以便使用共同的口語詞彙進行分析:整合引擎、本體論、詞彙表和分類法
- 處理半結構化數據:發現模式、單詞、標識符和衝突
- 當文本脫離上下文時產生的新處理機會
- 架構和非結構化數據:數據倉庫2.0
- 構建非結構化關聯數據庫並將其與結構化數據鏈接
- 可視化和自組織映射(SOM),包括Compudigm和Raptor解決方案
- 從電子表格數據和電子郵件中獲取知識
- 實施和管理元數據:數據模型、數據質量等

William H. Inmon是Inmon Data Systems的創始人、總裁和首席技術官。他是數據倉庫概念、企業信息工廠和政府信息工廠的創始人。Inmon已經撰寫了47本關於數據倉庫、數據庫和信息技術管理的書籍,以及超過750篇發表在Data Management Review、Byte、Datamation和ComputerWorld等專業期刊上的文章。他的b-eye-network.com通訊目前已經達到55,000人的讀者。

Anthony Nesavich曾在Inmon Data Systems工作。