Information Retrieval Models: Foundations and Relationships (Paperback)
暫譯: 資訊檢索模型：基礎與關係 (平裝本)

Name: Information Retrieval Models: Foundations and Relationships (Paperback)
Price: 1501 TWD
Availability: InStock
Author: Thomas Roelleke
ISBN: 1627050787

Thomas Roelleke

出版商: Morgan & Claypool
出版日期: 2013-07-01
售價: $1,580
貴賓價: 9.5 折 $1,501
語言: 英文
頁數: 164
裝訂: Paperback
ISBN: 1627050787
ISBN-13: 9781627050784
相關分類: Text-mining

立即出貨 (庫存=1)

商品描述

Information Retrieval (IR) models are a core component of IR research and IR systems. The past decade brought a consolidation of the family of IR models, which by 2000 consisted of relatively isolated views on TF-IDF (Term-Frequency times Inverse-Document-Frequency) as the weighting scheme in the vector-space model (VSM), the probabilistic relevance framework (PRF), the binary independence retrieval (BIR) model, BM25 (Best-Match Version 25, the main instantiation of the PRF/BIR), and language modelling (LM). Also, the early 2000s saw the arrival of divergence from randomness (DFR).

Regarding intuition and simplicity, though LM is clear from a probabilistic point of view, several people stated: "It is easy to understand TF-IDF and BM25. For LM, however, we understand the math, but we do not fully understand why it works."

This book takes a horizontal approach gathering the foundations of TF-IDF, PRF, BIR, Poisson, BM25, LM, probabilistic inference networks (PIN's), and divergence-based models. The aim is to create a consolidated and balanced view on the main models.

A particular focus of this book is on the "relationships between models." This includes an overview over the main frameworks (PRF, logical IR, VSM, generalized VSM) and a pairing of TF-IDF with other models. It becomes evident that TF-IDF and LM measure the same, namely the dependence (overlap) between document and query. The Poisson probability helps to establish probabilistic, non-heuristic roots for TF-IDF, and the Poisson parameter, average term frequency, is a binding link between several retrieval models and model parameters.

Table of Contents: List of Figures / Preface / Acknowledgments / Introduction / Foundations of IR Models / Relationships Between IR Models / Summary & Research Outlook / Bibliography / Author's Biography / Index

商品描述(中文翻譯)

資訊檢索（IR）模型是資訊檢索研究和系統的核心組成部分。在過去十年中，IR 模型家族的整合逐漸形成，到 2000 年時，這些模型主要包括對 TF-IDF（詞頻與逆文檔頻率的乘積）作為向量空間模型（VSM）中的加權方案的相對孤立的觀點、概率相關框架（PRF）、二元獨立檢索（BIR）模型、BM25（最佳匹配版本 25，PRF/BIR 的主要實例）以及語言模型（LM）。此外，2000 年代初期還出現了隨機偏差（DFR）。

在直觀性和簡單性方面，雖然從概率的角度來看，LM 是清晰的，但有幾個人表示：「TF-IDF 和 BM25 容易理解。然而，對於 LM，我們理解數學，但並不完全理解它為什麼有效。」

本書採取橫向的方法，匯集了 TF-IDF、PRF、BIR、泊松分佈、BM25、LM、概率推理網絡（PIN）和基於偏差的模型的基礎。其目的是創建對主要模型的整合和平衡的看法。

本書特別關注「模型之間的關係」。這包括對主要框架（PRF、邏輯 IR、VSM、廣義 VSM）的概述，以及將 TF-IDF 與其他模型配對。顯而易見，TF-IDF 和 LM 測量的是相同的，即文檔與查詢之間的依賴性（重疊）。泊松概率有助於為 TF-IDF 建立概率性、非啟發式的根源，而泊松參數，即平均詞頻，則是幾個檢索模型和模型參數之間的聯繫。

目錄：圖表清單 / 前言 / 致謝 / 介紹 / IR 模型的基礎 / IR 模型之間的關係 / 總結與研究展望 / 參考文獻 / 作者簡介 / 索引