Programming for Corpus Linguistics with Python and Dataframes

Keller, Daniel

  • 出版商: Cambridge
  • 出版日期: 2024-07-31
  • 售價: $2,520
  • 貴賓價: 9.5$2,394
  • 語言: 英文
  • 頁數: 75
  • 裝訂: Hardcover - also called cloth, retail trade, or trade
  • ISBN: 1009486780
  • ISBN-13: 9781009486781
  • 相關分類: Python程式語言
  • 尚未上市,歡迎預購

商品描述

This Element offers intermediate or experienced programmers algorithms for Corpus Linguistic (CL) programming in the Python language using dataframes that provide a fast, efficient, intuitive set of methods for working with large, complex datasets such as corpora. This Element demonstrates principles of dataframe programming applied to CL analyses, as well as complete algorithms for creating concordances; producing lists of collocates, keywords, and lexical bundles; and performing key feature analysis. An additional algorithm for creating dataframe corpora is presented including methods for tokenizing, part-of-speech tagging, and lemmatizing using spaCy. This Element provides a set of core skills that can be applied to a range of CL research questions, as well as to original analyses not possible with existing corpus software.

商品描述(中文翻譯)

這本書提供給中級或有經驗的程式設計師在Python語言中進行語料庫語言學(CL)編程的演算法,使用資料框(dataframes)提供了一組快速、高效、直觀的方法,用於處理大型、複雜的數據集,如語料庫。本書演示了將資料框編程原則應用於CL分析的方法,以及用於創建一致性、生成共現詞、關鍵詞和詞彙束列表以及執行關鍵特徵分析的完整演算法。此外,還介紹了一種用於創建資料框語料庫的演算法,包括使用spaCy進行分詞、詞性標註和詞形還原的方法。本書提供了一組核心技能,可應用於各種CL研究問題,以及對現有語料庫軟體無法實現的原創分析。