Python Data Cleaning Cookbook: Modern techniques and Python tools to detect and remove dirty data and extract key insights

Walker, Michael

  • 出版商: Packt Publishing
  • 出版日期: 2020-12-11
  • 售價: $1,260
  • 貴賓價: 9.5$1,197
  • 語言: 英文
  • 頁數: 436
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1800565666
  • ISBN-13: 9781800565661
  • 相關分類: Python程式語言
  • 相關翻譯: Python 數據清洗 (簡中版)
  • 立即出貨 (庫存=1)

買這商品的人也買了...

商品描述

Discover how to describe your data in detail, identify data issues, and find out how to solve them using commonly used techniques and tips and tricks


Key features

  • Get well-versed with various data cleaning techniques to reveal key insights
  • Manipulate data of different complexities to shape them into the right form as per your business needs
  • Clean, monitor, and validate large data volumes to diagnose problems before moving on to data analysis


Book Description

Getting clean data to reveal insights is essential, as directly jumping into data analysis without proper data cleaning may lead to incorrect results. This book shows you tools and techniques that you can apply to clean and handle data with Python. You'll begin by getting familiar with the shape of data by using practices that can be deployed routinely with most data sources. Then, the book teaches you how to manipulate data to get it into a useful form. You'll also learn how to filter and summarize data to gain insights and better understand what makes sense and what does not, along with discovering how to operate on data to address the issues you've identified. Moving on, you'll perform key tasks, such as handling missing values, validating errors, removing duplicate data, monitoring high volumes of data, and handling outliers and invalid dates. Next, you'll cover recipes on using supervised learning and Naive Bayes analysis to identify unexpected values and classification errors, and generate visualizations for exploratory data analysis (EDA) to visualize unexpected values. Finally, you'll build functions and classes that you can reuse without modification when you have new data.


By the end of this Python book, you'll be equipped with all the key skills that you need to clean data and diagnose problems within it.


What you will learn

  • Find out how to read and analyze data from a variety of sources
  • Produce summaries of the attributes of data frames, columns, and rows
  • Filter data and select columns of interest that satisfy given criteria
  • Address messy data issues, including working with dates and missing values
  • Improve your productivity in Python pandas by using method chaining
  • Use visualizations to gain additional insights and identify potential data issues
  • Enhance your ability to learn what is going on in your data
  • Build user-defined functions and classes to automate data cleaning


Who this book is for

This book is for anyone looking for ways to handle messy, duplicate, and poor data using different Python tools and techniques. The book takes a recipe-based approach to help you to learn how to clean and manage data. Working knowledge of Python programming is all you need to get the most out of the book.

商品描述(中文翻譯)

發現如何詳細描述您的數據,識別數據問題,並了解如何使用常用技巧和訣竅解決這些問題。

主要特點:
- 熟悉各種數據清理技術,以揭示關鍵見解
- 操作不同複雜度的數據,使其符合業務需求
- 在進行數據分析之前,清理、監控和驗證大量數據,以診斷問題

書籍描述:
獲取乾淨的數據以揭示見解至關重要,因為在沒有適當的數據清理的情況下直接進行數據分析可能導致不正確的結果。本書向您展示了使用Python清理和處理數據的工具和技術。您將首先通過使用可在大多數數據源上常規部署的實踐來熟悉數據的形狀。然後,本書教您如何操作數據以使其成為有用的形式。您還將學習如何過濾和總結數據以獲得見解,更好地理解什麼是有意義的,什麼是無意義的,以及如何操作數據以解決您所識別的問題。接下來,您將執行關鍵任務,例如處理缺失值,驗證錯誤,刪除重複數據,監控大量數據,以及處理異常值和無效日期。然後,您將學習使用監督學習和朴素貝葉斯分析來識別意外值和分類錯誤的方法,並生成用於探索性數據分析(EDA)的可視化圖表以可視化意外值。最後,您將構建可以在有新數據時無需修改即可重複使用的函數和類。

通過閱讀本書,您將掌握清理數據和診斷其中問題所需的所有關鍵技能。

您將學到:
- 了解如何從各種來源讀取和分析數據
- 生成數據框、列和行的屬性摘要
- 過濾數據並選擇滿足給定條件的感興趣的列
- 解決混亂的數據問題,包括處理日期和缺失值
- 通過使用方法鏈來提高在Python pandas中的生產力
- 使用可視化圖表獲得額外見解並識別潛在的數據問題
- 提高您理解數據內容的能力
- 構建用於自動化數據清理的自定義函數和類

本書適合尋找使用不同的Python工具和技術處理混亂、重複和低質數據的方法的任何人。只需具備Python編程的工作知識,您就能充分利用本書。