Veracity of Data: From Truth Discovery Computation Algorithms to Models of Misinformation Dynamics (Synthesis Lectures on Data Management)
Laure Berti-Équille, Javier Borge-Holthoefer
In the Web, a massive amount of user-generated contents are available through various channels (e.g., texts, tweets, Web tables, databases, multimedia-sharing platforms, etc.). Conflicting information, rumors, erroneous and fake contents can be easily spread across multiple sources, making it hard to distinguish between what is true and what is not. This monograph gives an overview of fundamental issues and recent contributions for ascertaining the veracity of data in the era of Big Data. The text is organized into six chapters, focusing on structured data extracted from texts. Chapter One introduces the problem of ascertaining the veracity of data in a multi-source and evolving context. Issues related to information extraction are presented in chapter Two. It is followed by practical techniques for evaluating data source reputation and authoritativeness in Chapter Three, including a review of the main models and Bayesian approaches of trust management. Current truth discovery computation algorithms are presented in details in Chapter Four. The theoretical foundations and various approaches for modeling diffusion phenomenon of misinformation spreading in networked systems is studied in Chapter Five. Finally, truth discovery computation from extracted data in a dynamic context of misinformation propagation raises interesting challenges that are explored in Chapter Six. Supplementary material including source codes, datasets, and slides are offered online. This text is intended for a seminar course at the graduate level. It is also to serve as a useful resource for researchers and practitioners who are interested in the study of fact-checking, truth discovery or rumor spreading.