Corpus Linguistics and Statistics with R: Introduction to Quantitative Methods in Linguistics (Quantitative Methods in the Humanities and Social Sciences)

Guillaume Desagulier

商品描述

This textbook examines empirical linguistics from a theoretical linguist’s perspective. It provides both a theoretical discussion of what quantitative corpus linguistics entails and detailed, hands-on, step-by-step instructions to implement the techniques in the field. The statistical methodology and R-based coding from this book teach readers the basic and then more advanced skills to work with large data sets in their linguistics research and studies. Massive data sets are now more than ever the basis for work that ranges from usage-based linguistics to the far reaches of applied linguistics. This book presents much of the methodology in a corpus-based approach. However, the corpus-based methods in this book are also essential components of recent developments in sociolinguistics, historical linguistics, computational linguistics, and psycholinguistics. Material from the book will also be appealing to researchers in digital humanities and the many non-linguistic fields that use textual data analysis and text-based sensorimetrics. Chapters cover topics including corpus processing, frequencing data, and clustering methods. Case studies illustrate each chapter with accompanying data sets, R code, and exercises for use by readers. This book may be used in advanced undergraduate courses, graduate courses, and self-study.

商品描述(中文翻譯)

這本教科書從一位理論語言學家的角度探討實證語言學。它提供了對量化語料庫語言學的理論討論,並提供了詳細的、實用的、逐步的指導,以實施這些技術。本書介紹了統計方法和基於R的編碼,教讀者在語言學研究和學習中處理大型數據集的基本和高級技能。大量的數據集現在是從基於使用的語言學到應用語言學的各個領域的基礎。本書以語料庫為基礎的方法介紹了大部分的方法論。然而,本書中的語料庫方法也是社會語言學、歷史語言學、計算語言學和心理語言學等最新發展的重要組成部分。本書的材料也對數位人文學研究人員和許多使用文本數據分析和基於文本的感知計量的非語言學領域具有吸引力。各章節涵蓋了語料庫處理、頻率數據和聚類方法等主題。案例研究以相應的數據集、R代碼和讀者使用的練習來說明每個章節。本書可用於高年級本科課程、研究生課程和自學。