Python Data Analysis Cookbook

Ivan Idris

  • 出版商: Packt Publishing
  • 出版日期: 2016-07-22
  • 售價: $2,040
  • 貴賓價: 9.5$1,938
  • 語言: 英文
  • 頁數: 462
  • 裝訂: Paperback
  • ISBN: 178528228X
  • ISBN-13: 9781785282287
  • 相關分類: Python程式語言Data Science
  • 下單後立即進貨 (約3~4週)

商品描述

Key Features

  • Analyze Big Data sets, create attractive visualizations, and manipulate and process various data types
  • Packed with rich recipes to help you learn and explore amazing algorithms for statistics and machine learning
  • Authored by Ivan Idris, expert in python programming and proud author of eight highly reviewed books

Book Description

Data analysis is a rapidly evolving field and Python is a multi-paradigm programming language suitable for object-oriented application development and functional design patterns. As Python offers a range of tools and libraries for all purposes, it has slowly evolved as the primary language for data science, including topics on: data analysis, visualization, and machine learning.

Python Data Analysis Cookbook focuses on reproducibility and creating production-ready systems. You will start with recipes that set the foundation for data analysis with libraries such as matplotlib, NumPy, and pandas. You will learn to create visualizations by choosing color maps and palettes then dive into statistical data analysis using distribution algorithms and correlations. You’ll then help you find your way around different data and numerical problems, get to grips with Spark and HDFS, and then set up migration scripts for web mining.

In this book, you will dive deeper into recipes on spectral analysis, smoothing, and bootstrapping methods. Moving on, you will learn to rank stocks and check market efficiency, then work with metrics and clusters. You will achieve parallelism to improve system performance by using multiple threads and speeding up your code.

By the end of the book, you will be capable of handling various data analysis techniques in Python and devising solutions for problem scenarios.

What You Will Learn

  • Set up reproducible data analysis
  • Clean and transform data
  • Apply advanced statistical analysis
  • Create attractive data visualizations
  • Web scrape and work with databases, Hadoop, and Spark
  • Analyze images and time series data
  • Mine text and analyze social networks
  • Use machine learning and evaluate the results
  • Take advantage of parallelism and concurrency

About the Author

Ivan Idris was born in Bulgaria to Indonesian parents. He moved to the Netherlands and graduated in experimental physics. His graduation thesis had a strong emphasis on applied computer science. After graduating, he worked for several companies as a software developer, data warehouse developer, and QA analyst.

His professional interests are business intelligence, big data, and cloud computing. He enjoys writing clean, testable code and interesting technical articles. He is the author of NumPy Beginner's Guide, NumPy Cookbook, Learning NumPy, and Python Data Analysis, all by Packt Publishing.

Table of Contents

  1. Laying the Foundation for Reproducible Data Analysis
  2. Creating Attractive Data Visualizations
  3. Statistical Data Analysis and Probability
  4. Dealing with Data and Numerical Issues
  5. Web Mining, Databases, and Big Data
  6. Signal Processing and Timeseries
  7. Selecting Stocks with Financial Data Analysis
  8. Text Mining and Social Network Analysis
  9. Ensemble Learning and Dimensionality Reduction
  10. Evaluating Classifi ers, Regressors, and Clusters
  11. Analyzing Images
  12. Parallelism and Performance
  13. Glossary
  14. Function Reference

商品描述(中文翻譯)

主要特點



  • 分析大數據集,創建引人注目的可視化效果,並處理和處理各種數據類型

  • 提供豐富的食譜,幫助您學習和探索統計和機器學習的驚人算法

  • 由Ivan Idris撰寫,他是Python編程專家,也是八本備受好評的書籍的自豪作者

書籍描述


數據分析是一個快速發展的領域,而Python是一種適合面向對象應用程序開發和功能設計模式的多範式編程語言。由於Python提供了各種工具和庫,適用於各種目的,它已經逐漸成為數據科學的主要語言,包括數據分析、可視化和機器學習等主題。


《Python數據分析食譜》專注於可重現性和創建生產就緒的系統。您將從使用matplotlib、NumPy和pandas等庫建立數據分析基礎的食譜開始。您將學習通過選擇顏色映射和調色板來創建可視化效果,然後深入研究使用分佈算法和相關性進行統計數據分析。然後,您將幫助您解決不同的數據和數值問題,熟悉Spark和HDFS,然後設置遷移腳本進行Web挖掘。


在本書中,您將深入研究光譜分析、平滑和自助法的食譜。接著,您將學習排名股票和檢查市場效率,然後處理指標和集群。您將通過使用多個線程實現並行性以提高系統性能,加快代碼速度。


通過閱讀本書,您將能夠在Python中應對各種數據分析技術,並為問題場景制定解決方案。

您將學到什麼



  • 建立可重現的數據分析

  • 清理和轉換數據

  • 應用高級統計分析

  • 創建引人注目的數據可視化效果

  • 網絡爬蟲和與數據庫、Hadoop和Spark的工作

  • 分析圖像和時間序列數據

  • 文本挖掘和社交網絡分析

  • 使用機器學習並評估結果

  • 利用並行性和並發性

關於作者


Ivan Idris出生於保加利亞,父母是印度尼西亞人。他移居荷蘭並畢業於實驗物理學。他的畢業論文強調應用計算機科學。畢業後,他曾在幾家公司擔任軟件開發人員、數據倉庫開發人員和QA分析師。


他的專業興趣是商業智能、大數據和雲計算。他喜歡編寫乾淨、可測試的代碼和有趣的技術文章。他是Packt Publishing出版的《NumPy初學指南》、《NumPy食譜》、《學習NumPy》和《Python數據分析》的作者。

目錄



  1. 為可重現的數據分析打下基礎

  2. 創建引人注目的數據可視化效果

  3. 統計數據分析和概率

  4. 處理數據和數值問題

  5. Web挖掘、數據庫和大數據

  6. 信號處理和時間序列

  7. 使用金融數據分析選擇股票

  8. 文本挖掘和社交網絡分析

  9. 集成學習和降維

  10. 評估分類器、回歸器和集群

  11. 分析圖像

  12. 並行性和性能

  13. 詞彙表

  14. 函數參考