Statistical Data Cleaning with Applications in R

Mark van der Loo, Edwin de Jonge

  • 出版商: Wiley
  • 出版日期: 2018-04-16
  • 定價: $2,600
  • 售價: 8.0$2,080
  • 語言: 英文
  • 頁數: 320
  • 裝訂: Hardcover
  • ISBN: 1118897153
  • ISBN-13: 9781118897157
  • 相關分類: R 語言Data Science
  • 相關翻譯: R統計數據清洗及應用 (簡中版)
  • 立即出貨 (庫存 < 4)

商品描述

A comprehensive guide to automated statistical data cleaning 

The production of clean data is a complex and time-consuming process that requires both technical know-how and statistical expertise. Statistical Data Cleaning brings together a wide range of techniques for cleaning textual, numeric or categorical data. This book examines technical data cleaning methods relating to data representation and data structure. A prominent role is given to statistical data validation, data cleaning based on predefined restrictions, and data cleaning strategy.

Key features:

  • Focuses on the automation of data cleaning methods, including both theory and applications written in R.
    • Enables the reader to design data cleaning processes for either one-off analytical purposes or for setting up production systems that clean data on a regular basis.
    • Explores statistical techniques for solving issues such as incompleteness, contradictions and outliers, integration of data cleaning components and quality monitoring.
    • Supported by an accompanying website featuring data and R code.

This book enables data scientists and statistical analysts working with data to deepen their understanding of data cleaning as well as to upgrade their practical data cleaning skills. It can also be used as material for a course in data cleaning and analyses. 

商品描述(中文翻譯)

《自動化統計數據清理全面指南》

生產乾淨的數據是一個複雜且耗時的過程,需要技術知識和統計專業知識。《統計數據清理》匯集了各種清理文本、數值或分類數據的技術。本書探討了與數據表示和數據結構相關的技術數據清理方法。統計數據驗證、基於預定限制的數據清理和數據清理策略在其中扮演著重要角色。

主要特點:
- 關注數據清理方法的自動化,包括理論和應用,並使用R語言編寫。
- 讓讀者能夠設計數據清理流程,無論是為了一次性分析還是建立定期清理數據的生產系統。
- 探索統計技術來解決不完整、矛盾和異常值等問題,整合數據清理組件和質量監控。
- 附帶網站提供數據和R代碼支持。

本書使得與數據工作的數據科學家和統計分析師能夠加深對數據清理的理解,並提升實際數據清理技能。同時,它也可以作為數據清理和分析課程的教材使用。