R語言數據分析與挖掘實戰手冊

程靜

  • 出版商: 中國鐵道
  • 出版日期: 2019-06-01
  • 定價: $359
  • 售價: 7.5$269
  • 語言: 簡體中文
  • 頁數: 256
  • 裝訂: 平裝
  • ISBN: 7113257453
  • ISBN-13: 9787113257453
  • 相關分類: R 語言Data Science
  • 立即出貨

買這商品的人也買了...

商品描述

本書系統地介紹了利用R 語言進行數據分析和挖掘的相關技術,
採用由淺入深的框架體系:開篇伊始介紹R 語言的基礎操作,
進而介紹回歸分析、方差分析等數據分析的方法,
以更好地探索數據內部結構,獲取數據所包含的信息;
更重要的是為後續的數據挖掘提供理論依據;
*後介紹典型數據挖掘工具和方法,採用理論基礎到算法介紹到案例實戰的佈局,
讓讀者深刻感知數據挖掘的精髓,在了解算法的同時更好地學以致用。

作者簡介

程靜

畢業於重慶大學,目前就職於西部估值最高的互聯網公司豬八戒網,
擔任高級數據分析工程師,負責行業數據蒐集、整理、分析,
並依據數據做出行業研究、評估和預測,
擅長大數據分析及數據挖掘的各種算法熟練使用R語言及Python語言。

目錄大綱

第1 章R 語言簡介

1.1 R 語言軟件的安裝與運行.......................................... .................................................. .....1

1.1.1 R 語言軟件的安裝、啟動與關閉...................................... .....................................1

1.1.2 R 語言程輯包的安裝和使用...................................... ............................................4

1.2 R 語言的數據結構............................................ .................................................. ...............6

1.2.1 R 語言對象和類型.......................................... .................................................. .....6

1.2.2 向量.............................................. .................................................. ........................7

1.2.3 數組和矩陣............................................ .................................................. .............12

1.2.4 列表.............................................. .................................................. ......................17

1.2.5 數據框............................................. .................................................. ...................20

第2 章數據的讀取與保存

2.1 數據的讀取............................................. .................................................. .......................24

2.1.1 讀取內置數據集和文本文件....................................... .........................................24

2.1.2 讀取Excel 數據和CSV 格式的數據...................................... ..............................30

2.1.3 讀取R 語言格式數據和網頁數據...................................... ..................................33

2.1.4 讀取其他格式的數據......................................... .................................................. 34

2.2 數據保存............................................... .................................................. .........................36

2.2.1 寫出數據............................................ .................................................. ................36

2.2.2 使用函數cat().......................................... .................................................. ..........37

2.2.3 保存為R 語言格式文件......................................... ..............................................38

2.2.4 保存為其他類型文件.......................................... .................................................39

第3 章數據預處理

3.1 缺失值處理.............................................. .................................................. ......................40

3.1.1 缺失值判斷............................................ .................................................. .............40

3.1.2 缺失模型判斷............................................ .................................................. .........44

3.1.3 常用處理方法............................................ .................................................. .........48

3.2 數據整理............................................... .................................................. .........................53

3.2.1 數據合併............................................. .................................................. ...............53

3.2.2 選取子集............................................ .................................................. ................56

3.2.3 數據轉換............................................. .................................................. ...............59

第4 章數據的探索性分析

4.1 基本繪圖函數.............................................. .................................................. ..................66

4.2 探索單個變量.............................................. .................................................. ..................74

4.2.1 單組數據的圖形描述......................................... .................................................. 74

4.2.2 單組數據的描述性分析........................................ ...............................................79

4.3 探索多個變量............................................. .................................................. ...................81

4.3.1 兩組數據的圖形描述......................................... .................................................. 81

4.3.2 多組數據的圖形描述......................................... .................................................. 85

4.3.3 多組數據的描述性統計........................................ ...............................................88

4.4 其他圖像探索.............................................. .................................................. ..................90

第5 章回歸分析

5.1 一元線性回歸.............................................. .................................................. ..................94

5.1.1 模型簡介............................................. .................................................. ...............94

5.1.2 函數介紹............................................. .................................................. ...............96

5.1.3 綜合案例:iris 數據集的一元回歸建模.................................... ..........................97

5.2 多元線性回歸.............................................. .................................................. ..................99

5.2.1 模型簡介............................................. .................................................. ...............99

5.2.2 綜合案例:iris 數據集的多元回歸建模.................................... ........................100

5.3 變量的選擇.............................................. .................................................. ....................105

5.3.1 逐步回歸方法簡介及函數介紹........................................ ..................................105

5.3.2 綜合案例:swiss 數據集的逐步回歸建模.................................... .....................106

5.3.3 嶺回歸的方法簡介及函數介紹....................................... ...................................109

5.3.4 綜合案例:longley 數據集的嶺回歸探索..................................... .....................110

5.3.5 lasso 回歸方法簡介及函數介紹........................................ .................................114

5.3.6 綜合案例:longley 數據集的lasso 回歸建模.................................... ................115

5.4 Logistic 回歸............................................... .................................................. .................117

5.4.1 模型簡介............................................. .................................................. .............117

5.4.2 函數介紹............................................. .................................................. .............119

5.4.3 綜合案例:iris 數據集的邏輯回歸建模.................................... ........................120

第6 章方差分析

6.1 單因素方差分析............................................. .................................................. .............124

6.1.1 模型介紹............................................. .................................................. .............124

6.1.2 函數介紹............................................. .................................................. .............126

6.1.3 綜合案例:不同治療方法下膽固醇降低效果的差異性分析............................ 127

6.2 雙因素方差分析............................................. .................................................. .............130

6.2.1 模型介紹............................................. .................................................. .............130

6.2.2 綜合案例:不同劑量下老鼠妊娠重量的差異性分析.................................. ...... 132

6.3 協方差分析.............................................. .................................................. ....................136

6.3.1 模型簡介............................................. .................................................. .............136

6.3.2 函數介紹............................................. .................................................. .............136

6.3.3 綜合案例:hotdog 數據集的協方差分析..................................... ......................137

第7 章主成分分析和因子分析

7.1 降維的基本方法:主成分分析........................................ .............................................139

7.1.1 理論基礎:原始變量的線性組合....................................... ...............................139

7.1.2 模型介紹............................................. .................................................. .............141

7.1.3 函數介紹............................................. .................................................. .............143

7.1.4 綜合案例:longley 數據集的變量降維及回歸................................... ...............144

7.1.5 綜合案例:longley 數據集的變量降維及回歸(主成分回歸)....................... 148

7.2 推廣發展:因子分析............................................ .................................................. ......150

7.2.1 理論基礎:多個變量綜合為少數因子..................................... ..........................150

7.2.2 模型介紹............................................. .................................................. .............151

7.2.3 函數介紹............................................. .................................................. .............153

7.2.4 綜合案例:能力和智商測試的因子分析探索.................................... ...............154

第8 章判別分析

8.1 距離判別法.............................................. .................................................. ....................160

8.1.1 理論基礎:離誰近,就屬於誰..................................... .....................................160

8.1.2 函數介紹............................................. .................................................. .............162

8.1.3 綜合案例:基於距離判別的iris 數據集分類.................................... ................164

8.2 Bayes 判別法.............................................. .................................................. .................168

8.2.1 理論基礎:先驗概率與錯判損失..................................... .................................168

8.2.2 函數介紹............................................. .................................................. .............170

8.2.3 綜合案例:基於iris 數據集的Bayes 判別分析.................................... ............171

8.3 Fisher 判別法.............................................. .................................................. .................171

8.3.1 理論基礎:投影........................................... .................................................. ....171

8.3.2 函數介紹............................................. .................................................. .............173

8.3.3 綜合案例:基於Fisher 判別的iris 數據集分類.................................... ............174

第9 章常規聚類分析

9.1 深入了解聚類分析............................................ .................................................. ..........178

9.1.1 差異與分類............................................ .................................................. ...........178

9.1.2 主流的聚類算法.......................................... .................................................. .....179

9.2 動態聚類.............................................. .................................................. ........................180

9.2.1 聚類的基本過程.......................................... .................................................. .....180

9.2.2 函數介紹............................................. .................................................. .............183

9.2.3 綜合案例:基於隨機生成序列的動態聚類.................................... ...................184

9.3 層次聚類.............................................. .................................................. ........................194

9.3.1 聚類的基本過程.......................................... .................................................. .....194

9.3.2 函數介紹............................................. .................................................. .............197

9.3.3 綜合案例:基於UScitiesD 數據集的層次聚類.................................... .............199

9.4 密度聚類.............................................. .................................................. ........................202

9.4.1 聚類的基本過程.......................................... .................................................. .....202

9.4.2 函數介紹............................................. .................................................. .............202

9.4.3 綜合案例:基於隨機生成序列的密度聚類.................................... ...................203

9.5 EM 聚類.............................................. .................................................. .........................204

9.5.1 聚類的基本過程.......................................... .................................................. .....205

9.5.2 函數介紹............................................. .................................................. .............205

9.5.3 綜合案例:基於iris 數據集的EM 聚類.................................... ........................206

第10 章關聯規則

10.1 簡單關聯規則.............................................. .................................................. ..............210

10.1.1 基本概念與表示形式.......................................... .............................................210

10.1.2 評價簡單關聯規則的有效性和實用性..................................... ........................211

10.2 序列關聯規則.............................................. .................................................. ..............212

10.2.1 差異與基本概念........................................... .................................................. ..212

10.2.2 生成序列關聯規則........................................... ................................................213

10.3 Apriori 算法............................................... .................................................. ................214

10.3.1 算法介紹:挖掘頻繁項集........................................ ........................................214

10.3.2 函數介紹............................................. .................................................. ...........215

10.3.3 綜合案例:基於Titanic 數據集的關聯規則挖掘.................................... ........216

10.4 Eclat 算法............................................... .................................................. ...................224

10.4.1 算法介紹:自底向上的搜索....................................... .....................................224

10.4.2 函數介紹............................................. .................................................. ...........224

10.4.3 綜合案例:基於美國人口調查數據的關聯規則挖掘................................... ... 225

10.5 SPADE 算法............................................... .................................................. ................230

10.5.1 算法介紹:基於序列格的搜索和連接..................................... ........................231

10.5.2 函數介紹............................................. .................................................. ...........232

10.5.3 綜合案例:基於zaki 數據集的序列關聯規則挖掘................................... ......233

第11 章神經網絡

11.1 深入了解人工神經網絡............................................ .................................................. .239

11.1.1 生物神經元............................................ .................................................. .........240

11.1.2 人工神經元模型........................................... .................................................. ..241

11.1.3 人工神經網絡種類........................................... ................................................244

11.1.4 建立模型的一般步驟.......................................... ..............................................247

11.2 BP 反向傳播網絡............................................ .................................................. .........248

11.2.1 BP 反向傳播網絡模型......................................... ...........................................248

11.2.2 算法介紹............................................. .................................................. ............249

11.2.3 函數介紹............................................. .................................................. ............250

11.3 綜合案例:基於Boston 數據的波士頓郊區房價預測建模.................................... .... 252