A Data Scientist's Guide to Acquiring, Cleaning, and Managing Data in R

Samuel E. Buttrey, Lyn R. Whitaker

  • 出版商: Wiley
  • 出版日期: 2017-12-18
  • 定價: $2,800
  • 售價: 9.0$2,520
  • 語言: 英文
  • 頁數: 312
  • 裝訂: Hardcover
  • ISBN: 1119080029
  • ISBN-13: 9781119080022
  • 相關分類: R 語言Data Science
  • 立即出貨 (庫存=1)

買這商品的人也買了...

商品描述

The only how-to guide offering a unified, systemic approach to acquiring, cleaning, and managing data in R

Every experienced practitioner knows that preparing data for modeling is a painstaking, time-consuming process. Adding to the difficulty is that most modelers learn the steps involved in cleaning and managing data piecemeal, often on the fly, or they develop their own ad hoc methods. This book helps simplify their task by providing a unified, systematic approach to acquiring, modeling, manipulating, cleaning, and maintaining data in R. 

Starting with the very basics, data scientists Samuel E. Buttrey and Lyn R. Whitaker walk readers through the entire process. From what data looks like and what it should look like, they progress through all the steps involved in getting data ready for modeling.  They describe best practices for acquiring data from numerous sources; explore key issues in data handling, including text/regular expressions, big data, parallel processing, merging, matching, and checking for duplicates; and outline highly efficient and reliable techniques for documenting data and recordkeeping, including audit trails, getting data back out of R, and more.

  • The only single-source guide to R data and its preparation, it describes best practices for acquiring, manipulating, cleaning, and maintaining data
  • Begins with the basics and walks readers through all the steps necessary to get data ready for the modeling process
  • Provides expert guidance on how to document the processes described so that they are reproducible
  • Written by seasoned professionals, it provides both introductory and advanced techniques
  • Features case studies with supporting data and R code, hosted on a companion website

A Data Scientist's Guide to Acquiring, Cleaning and Managing Data in R is a valuable working resource/bench manual for practitioners who collect and analyze data, lab scientists and research associates of all levels of experience, and graduate-level data mining students.

商品描述(中文翻譯)

這本書是唯一一本提供統一、系統化方法來在 R 中獲取、清理和管理數據的指南。

每個有經驗的從業者都知道,為建模準備數據是一個費時費力的過程。更難的是,大多數建模者學習清理和管理數據的步驟時是零散的,通常是在工作中學習,或者他們自己開發了臨時的方法。這本書通過提供統一、系統化的方法來簡化他們的任務。

從基礎知識開始,數據科學家 Samuel E. Buttrey 和 Lyn R. Whitaker 引導讀者完成整個過程。從數據的外觀和應該的外觀開始,他們逐步介紹了準備數據進行建模所涉及的所有步驟。他們描述了從多個來源獲取數據的最佳實踐;探討了數據處理中的關鍵問題,包括文本/正則表達式、大數據、並行處理、合併、匹配和檢查重複;並概述了高效可靠的數據文檔和記錄技術,包括審計軌跡、從 R 中取回數據等。

這本書是關於 R 數據及其準備的唯一的單一來源指南,它描述了獲取、操作、清理和維護數據的最佳實踐。從基礎知識開始,引導讀者完成準備數據進行建模所需的所有步驟。提供了關於如何記錄所描述的過程以便能夠重現的專家指導。由經驗豐富的專業人士撰寫,提供了入門和高級技術。附帶有支持數據和 R 代碼的案例研究,可在配套網站上找到。

《A Data Scientist's Guide to Acquiring, Cleaning and Managing Data in R》是一本對於收集和分析數據的從業人員、實驗室科學家和研究助理以及研究生級別的數據挖掘學生來說是一本有價值的工作資源/參考手冊。