Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification

Jonathan Zdziarski






Considerable research and some brilliant minds have invented clever new ways to fight spam in all its nefarious forms. This landmark title describes, in-depth, how statistical filtering is being used by next generation spam filters to identify and filter spam. The author explains how spam filtering works and how language classification and machine learning combine to produce remarkably accurate spam filters. Readers gain a complete understanding of the mathematical approaches used in today's spam filters, decoding, tokenization, the use of various algorithms (including Bayesian analysis and Markovian discrimination), and the benefits of using open-source solutions to end spam. Interviews with the creators of many of the best spam filters provide further insight into the anti-spam crusade. Fascinating reading for any geek.


Table of Contents:


PART I: An Introduction to Spam Filtering
Chapter 1: The History of Spam
Chapter 2: Historical Approaches to Fighting Spam
Chapter 3: Language Classification Concepts
Chapter 4: Statistical Filtering Fundamentals

PART II: Fundamentals of Statistical Filtering
Chapter 5: Decoding: Uncombobulating Messages
Chapter 6:
Tokenization: The Building Blocks of Spam
Chapter 7: The Low-Down Dirty Tricks of Spammers
Chapter 8: Data Storage for a Zillion Records
Chapter 9: Scaling in Large Environments

PART III: Advanced Concepts of Statistical Filtering
Chapter 10: Testing Theory
Chapter 11: Concept Identification: Advanced Tokenization
Chapter 12: Fifth-Order Markovian Discrimination
Chapter 13: Intelligent Feature Set Reduction
Chapter 14: Collaborative Algorithms

Appendix: Shining Examples of Filtering