Text Processing with Ruby: Extract Value from the Data That Surrounds You (Paperback)

Rob Miller

  • 出版商: Pragmatic Bookshelf
  • 出版日期: 2015-10-02
  • 定價: $1,200
  • 售價: 8.0$960
  • 語言: 英文
  • 頁數: 274
  • 裝訂: Paperback
  • ISBN: 1680500708
  • ISBN-13: 9781680500707
  • 相關分類: Ruby
  • 立即出貨(限量) (庫存=2)

買這商品的人也買了...

商品描述

Text is everywhere. Web pages, databases, the contents of files--for almost any programming task you perform, you need to process text. Cut even the most complex text-based tasks down to size and learn how to master regular expressions, scrape information from Web pages, develop reusable utilities to process text in pipelines, and more.

Most information in the world is in text format, and programmers often find themselves needing to make sense of the data hiding within. It might be to convert it from one format to another, or to find out information about the text as a whole, or to extract information fromit. But how do you do this efficiently, avoiding labor-intensive, manual work?

Text Processing with Ruby takes a practical approach. You'll learn how to get text into your Ruby programs from the file system and from user input. You'll process delimited files such as CSVs, and write utilities that interact with other programs in text-processing pipelines. Decipher character encoding mysteries, and avoid the pain of jumbled characters and malformed output.

You'll learn to use regular expressions to match, extract, and replace patterns in text. You'll write a parser and learn how to process Web pages to pull out information from even the messiest of HTML.

Before long you'll be able to tackle even the most enormous and entangled text with ease, scything through gigabytes of data and effortlessly extracting the bits that matter.

What You Need:

This book requires a passing familiarity with the Ruby programming language, and assumes that you already have Ruby installed on your computer.

商品描述(中文翻譯)

文字無所不在。網頁、資料庫、檔案內容──幾乎任何你執行的程式任務都需要處理文字。將最複雜的基於文字的任務縮小並學習如何精通正規表示式,從網頁中擷取資訊,開發可重複使用的文字處理工具,以及更多。

世界上大部分的資訊都是以文字格式存在,程式設計師常常需要理解其中隱藏的資料。這可能是為了將其從一種格式轉換為另一種格式,或者是為了瞭解整個文字的相關資訊,或者是為了從中提取資訊。但是,如何高效地做到這一點,避免費時費力的手動工作呢?

《使用 Ruby 進行文字處理》採取實用的方法。您將學習如何從檔案系統和使用者輸入中將文字引入到 Ruby 程式中。您將處理分隔的檔案,例如 CSV 檔案,並編寫與其他程式在文字處理管道中互動的工具。解讀字符編碼的謎團,避免亂碼和格式錯誤的痛苦。

您將學習使用正規表示式在文字中匹配、提取和替換模式。您將編寫解析器,並學習如何處理網頁以從最混亂的 HTML 中提取資訊。

不久之後,您將能夠輕鬆處理最龐大且錯綜複雜的文字,輕鬆地處理數GB的資料並輕鬆地提取重要的部分。

需要什麼:

本書需要對 Ruby 程式語言有一定的了解,並假設您已經在電腦上安裝了 Ruby。