Web Content Mining with Java: Techniques for Exploiting the World's Biggest Info

Tony Loton

  • 出版商: Wiley
  • 出版日期: 2002-04-29
  • 售價: $1,100
  • 貴賓價: 9.8$1,078
  • 語言: 英文
  • 頁數: 328
  • 裝訂: Paperback
  • ISBN: 047084311X
  • ISBN-13: 9780470843116
  • 相關分類: Java 程式語言
  • 下單後立即進貨 (約5~7天)




What do you with information at the websites you visit? You read it, print it, and maybe do a screen grab. But you could do so much more with it if only you could get hold of the information in a more usable form: a form that you could manipulate, store and query automatically.

In this book you'll learn how to automate the:

  • discovery of websites containing interesting data
  • extraction of specific information from HTML and XML pages
  • presentation of aggregate information via your own portal
  • interpretation of data using text- and data-mining techniques
Java is the language of the web, so all practical examples are provided in the form of Java code that demonstrates HTTP communication, HTML and XML parsing, email retrieval and much more.

This is the book for you if you want some real, practical, help to get your Java-based information applications off the ground.

Table of Contents


About the Author.


Surveying the Scene

Language of the Web

HTML and XML Parsing

Data Filters and Structured Queries

Building a Portal with Java

Building a Search Engine with Java

Mail Mining with Java

Introduction to Text Mining

Introduction of Data Mining

Loose Ends and Looking Ahead

Appendix A: Software Installation and Configuration

Appendix B: Javadoc Extracts

Appendix C: Earlier Versions of JAXP

Appendix D: License and Copyright Statements

Appendix E: Census 1891Data XML

Appendix F: Share Price Cluster Data

Appendix G: Glossary of Acronyms


Further Reading