Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining (Hardcover)

Simon Munzert, Christian Rubba, Peter Meißner, Dominic Nyhuis

買這商品的人也買了...

商品描述

A hands on guide to web scraping and text mining for both beginners and experienced users of R

  • Introduces fundamental concepts of the main architecture of the web and databases and covers HTTP, HTML, XML, JSON, SQL.
  • Provides basic techniques to query web documents and data sets (XPath and regular expressions).
  • An extensive set of exercises are presented to guide the reader through each technique.
  • Explores both supervised and unsupervised techniques as well as advanced techniques such as data scraping and text management.
  • Case studies are featured throughout along with examples for each technique presented.
  • R code and solutions to exercises featured in the book are provided on a supporting website.

商品描述(中文翻譯)

一本實用的指南,針對R語言的初學者和有經驗的使用者,介紹網頁爬蟲和文本挖掘的技巧。


  • 介紹網頁和資料庫的主要架構概念,包括HTTP、HTML、XML、JSON、SQL。

  • 提供基本的技巧,用於查詢網頁文件和資料集(XPath和正則表達式)。

  • 提供大量的練習題,引導讀者逐步掌握每個技巧。

  • 探討監督式和非監督式技術,以及高級技術,如資料爬取和文本管理。

  • 書中提供案例研究和每個技巧的示例。

  • 書中提供R程式碼和練習題解答的支援網站。