Practical Data Analysis, 2/e (Paperback)

Hector Cuesta, Dr. Sampath Kumar

商品描述

Key Features

  • Learn to use various data analysis tools and algorithms to classify, cluster, visualize, simulate, and forecast your data
  • Apply Machine Learning algorithms to different kinds of data such as social networks, time series, and images
  • A hands-on guide to understanding the nature of data and how to turn it into insight

Book Description

Beyond buzzwords like Big Data or Data Science, there are a great opportunities to innovate in many businesses using data analysis to get data-driven products. Data analysis involves asking many questions about data in order to discover insights and generate value for a product or a service.

This book explains the basic data algorithms without the theoretical jargon, and you'll get hands-on turning data into insights using machine learning techniques. We will perform data-driven innovation processing for several types of data such as text, Images, social network graphs, documents, and time series, showing you how to implement large data processing with MongoDB and Apache Spark.

What you will learn

  • Acquire, format, and visualize your data
  • Build an image-similarity search engine
  • Generate meaningful visualizations anyone can understand
  • Get started with analyzing social network graphs
  • Find out how to implement sentiment text analysis
  • Install data analysis tools such as Pandas, MongoDB, and Apache Spark
  • Get to grips with Apache Spark
  • Implement machine learning algorithms such as classification or forecasting

About the Author

Hector Cuesta is founder and Chief Data Scientist at Dataxios, a machine intelligence research company. Holds a BA in Informatics and a M.Sc. in Computer Science. He provides consulting services for data-driven product design with experience in a variety of industries including financial services, retail, fintech, e-learning and Human Resources. He is an enthusiast of Robotics in his spare time.

Dr. Sampath Kumar works as an assistant professor and head of Department of Applied Statistics at Telangana University. He has completed M.Sc., M.Phl., and Ph. D. in statistics. He has five years of teaching experience for PG course. He has more than four years of experience in the corporate sector. His expertise is in statistical data analysis using SPSS, SAS, R, Minitab, MATLAB, and so on. He is an advanced programmer in SAS and matlab software. He has teaching experience in different, applied and pure statistics subjects such as forecasting models, applied regression analysis, multivariate data analysis, operations research, and so on for M.Sc. students. He is currently supervising Ph.D. scholars.

Table of Contents

  1. Getting Started
  2. Preprocessing Data
  3. Getting to Grips with Visualization
  4. Text Classification
  5. Similarity-Based Image Retrieval
  6. Simulation of Stock Prices
  7. Predicting Gold Prices
  8. Working with Support Vector Machines
  9. Modeling Infectious Diseases with Cellular Automata
  10. Working with Social Graphs
  11. Working with Twitter Data
  12. Data Processing and Aggregation with MongoDB
  13. Working with MapReduce
  14. Online Data Analysis with Jupyter and Wakari
  15. Understanding Data Processing using Apache Spark

商品描述(中文翻譯)

主要特點


  • 學習使用各種數據分析工具和算法,對數據進行分類、聚類、可視化、模擬和預測

  • 將機器學習算法應用於不同類型的數據,如社交網絡、時間序列和圖像

  • 實踐將數據轉化為洞察力的實用指南

書籍描述

除了像大數據或數據科學這樣的流行詞語外,使用數據分析在許多企業中創新的機會是巨大的,這可以為產品或服務帶來數據驅動的價值。數據分析涉及對數據提出許多問題,以發現洞察力並為產品或服務創造價值。

本書以不涉及理論術語的方式解釋基本的數據算法,並通過機器學習技術實踐將數據轉化為洞察力。我們將對文本、圖像、社交網絡圖、文檔和時間序列等多種類型的數據進行數據驅動的創新處理,並向您展示如何使用MongoDB和Apache Spark實現大數據處理。

你將學到什麼


  • 獲取、格式化和可視化數據

  • 構建圖像相似性搜索引擎

  • 生成任何人都能理解的有意義的可視化

  • 開始分析社交網絡圖

  • 了解如何實現情感文本分析

  • 安裝Pandas、MongoDB和Apache Spark等數據分析工具

  • 掌握Apache Spark

  • 實現分類或預測等機器學習算法

關於作者

Hector Cuesta 是Dataxios的創始人和首席數據科學家,該公司是一家機器智能研究公司。他擁有計算機信息學士學位和計算機科學碩士學位。他在金融服務、零售、金融科技、電子學習和人力資源等多個行業擁有數據驅動產品設計的咨詢服務經驗。他在業餘時間對機器人技術充滿熱情。

Dr. Sampath Kumar 現任泰拉蘇那大學應用統計學系助理教授和系主任。他擁有統計學碩士、哲學碩士和博士學位。他在研究生課程中有五年的教學經驗,並在企業界擁有四年以上的經驗。他擅長使用SPSS、SAS、R、Minitab、MATLAB等統計數據分析。他是SAS和MATLAB軟件的高級程序員。他在應用統計學科目(如預測模型、應用回歸分析、多變量數據分析、運籌學等)方面有教學經驗,目前正在指導博士研究生。

目錄


  1. 入門

  2. 數據預處理

  3. 深入理解可視化

  4. 文本分類

  5. 基於相似性的圖像檢索

  6. 股票價格模擬

  7. 黃金價格預測

  8. 使用支持向量機

  9. 使用元胞自動機建模傳染病

  10. 處理社交圖

  11. 處理Twitter數據

  12. 使用MongoDB進行數據處理和聚合

  13. 使用MapReduce

  14. 使用Jupyter和Wakari進行在線數據分析

  15. 理解使用Apache Spark進行數據處理