Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning

Benjamin Bengfort, Rebecca Bilbro, Tony Ojeda

買這商品的人也買了...

商品描述

From news and speeches to informal chatter on social media, natural language is one of the richest and most underutilized sources of data. Not only does it come in a constant stream, always changing and adapting in context; it also contains information that is not conveyed by traditional data sources. The key to unlocking natural language is through the creative application of text analytics. This practical book presents a data scientist’s approach to building language-aware products with applied machine learning.

You’ll learn robust, repeatable, and scalable techniques for text analysis with Python, including contextual and linguistic feature engineering, vectorization, classification, topic modeling, entity resolution, graph analysis, and visual steering. By the end of the book, you’ll be equipped with practical methods to solve any number of complex real-world problems.

  • Preprocess and vectorize text into high-dimensional feature representations
  • Perform document classification and topic modeling
  • Steer the model selection process with visual diagnostics
  • Extract key phrases, named entities, and graph structures to reason about data in text
  • Build a dialog framework to enable chatbots and language-driven interaction
  • Use Spark to scale processing power and neural networks to scale model complexity

商品描述(中文翻譯)

從新聞和演講到社交媒體上的非正式聊天,自然語言是最豐富且最未被充分利用的數據來源之一。它不僅以不斷變化和適應上下文的方式呈現,還包含傳統數據來源無法傳達的信息。解鎖自然語言的關鍵在於創造性地應用文本分析。這本實用書介紹了數據科學家在應用機器學習建立語言感知產品方面的方法。

您將學習使用Python進行文本分析的強大、可重複和可擴展的技術,包括上下文和語言特徵工程、向量化、分類、主題建模、實體解析、圖分析和視覺導向。通過閱讀本書,您將掌握解決各種複雜現實世界問題的實用方法。

本書內容包括:
- 將文本預處理並轉換為高維特徵表示
- 進行文檔分類和主題建模
- 使用視覺診斷工具來指導模型選擇過程
- 提取關鍵詞、命名實體和圖結構以推理文本中的數據
- 構建對話框架以實現聊天機器人和基於語言的互動
- 使用Spark擴展處理能力和神經網絡擴展模型複雜性