Mastering Text Mining with R(Paperback)

Ashish Kumar, Avinash Paul

  • 出版商: Packt Publishing
  • 出版日期: 2016-12-30
  • 售價: $1,770
  • 貴賓價: 9.5$1,682
  • 語言: 英文
  • 頁數: 258
  • 裝訂: Paperback
  • ISBN: 178355181X
  • ISBN-13: 9781783551811
  • 相關分類: Text-mining
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

商品描述

Master text-taming techniques and build effective text-processing applications with R

About This Book

  • This book will help you develop an in-depth understanding of the text mining process with lucid implementation in the R language
  • After reading this book, you will be able to enhance your skills on building text-mining apps with R
  • All the examples in the book use the latest version of R, making this book an update-to-date edition in the market

Who This Book Is For

If you are an R programmer, analyst, or data scientist who wants to gain experience in performing text data mining and analytics with R, then this book is for you. Exposure to working with statistical methods and language processing would be helpful.

What You Will Learn

  • Get acquainted with some of the highly efficient R packages such as OpenNLP and RWeka to perform various steps in the text mining process
  • Access and manipulate data from different sources such as JSON and HTTP
  • Process text using regular expressions
  • Get to know the different approaches of tagging texts, such as POS tagging, to get started with text analysis
  • Explore different dimensionality reduction techniques, such as Principal Component Analysis (PCA), and understand its implementation in R
  • Discover the underlying themes or topics that are present in an unstructured collection of documents, using common topic models such as Latent Dirichlet Allocation (LDA)
  • Build a baseline sentence completing application
  • Perform entity extraction and named entity recognition using R
  • Get an introduction to various approaches in opinion mining and their implementation in R

In Detail

Text Mining (or text data mining or text analytics) is a process of extracting useful and high-quality information from text by devising patterns and trends through machine learning, statistical pattern learning, and related algorithms and methods. R provides an extensive ecosystem to mine text through its many frameworks and packages.

This book will help you develop a thorough understanding of the steps in the text mining process and gain confidence in applying the concepts to build text-data driven products.

Starting with basic information about the statistics concepts used in text mining, the book will teach you how to access, cleanse, and process text using the R language and teach you how to analyze them. It will equip you with the tools and the associated knowledge about different tagging, chunking, and entailment approaches and their usage in natural language processing.

Moving on, the book will teach you different dimensionality reduction techniques and their implementation in R, along with topic modeling, text summarization, and extracting hidden themes from documents and collections. Next, we will cover pattern recognition in text data utilizing classification mechanisms, perform entity recognition, and develop an ontology learning framework. You will learn the concept of an opinion in a text document and be able to apply various techniques to extract a sentiment and opinion out of it.

By the end of the book, you will develop a practical application from the concepts learned, and will understand how text mining can be leveraged to analyze the massively available data on social media.

商品描述(中文翻譯)

掌握文字處理技巧並使用R建立有效的文字處理應用程式

關於本書
本書將幫助您深入了解使用R語言進行文字探勘的過程,並提供清晰易懂的實作範例。
閱讀本書後,您將能夠提升在使用R建立文字探勘應用程式的能力。
本書中的所有範例都使用最新版本的R,使本書成為市場上最新的版本。

本書適合對象
如果您是一位R程式設計師、分析師或資料科學家,並希望在R中進行文字資料探勘和分析,那麼本書非常適合您。具備統計方法和語言處理的工作經驗將有所幫助。

您將學到什麼
熟悉一些高效的R套件,如OpenNLP和RWeka,以執行文字探勘過程中的各個步驟。
存取和操作來自不同來源(如JSON和HTTP)的資料。
使用正則表達式處理文字。
了解標記文字的不同方法,例如詞性標記,以開始進行文字分析。
探索不同的降維技術,如主成分分析(PCA),並了解其在R中的實作。
使用常見的主題模型(如潛在狄利克雷分配(LDA))發現無結構文件集合中存在的基本主題或主題。
建立基準句子完成應用程式。
使用R執行實體提取和命名實體識別。
介紹意見探勘的各種方法及其在R中的實作。

詳細內容
文字探勘(或文字資料探勘或文字分析)是通過機器學習、統計模式學習和相關算法和方法,通過設計模式和趨勢從文字中提取有用且高質量的資訊的過程。R提供了廣泛的生態系統,通過其許多框架和套件來進行文字探勘。

本書將幫助您深入了解文字探勘過程中的各個步驟,並在應用概念建立文字資料驅動產品方面增強信心。

從文字探勘中使用的統計概念的基本資訊開始,本書將教您如何使用R語言存取、清理和處理文字,並教您如何進行分析。它將為您提供有關不同標記、分塊和蘊含方法及其在自然語言處理中的使用的工具和相關知識。

接下來,本書將教您不同的降維技術及其在R中的實作,以及主題建模、文字摘要和從文件和集合中提取隱藏主題。接著,我們將介紹在文字資料中使用分類機制進行模式識別,執行實體識別,並開發本體學習框架。您將學習在文字文件中的意見概念,並能夠應用各種技術從中提取情感和意見。

通過閱讀本書,您將從所學概念中開發出一個實用的應用程式,並了解如何利用文字探勘來分析社交媒體上大量可用的資料。