Python 機器學習手冊:從數據預處理到深度學習 (Machine Learning with Python Cookbook: Practical Solutions from Preprocessing to Deep Learning)

Chris Albon 著作 韓慧昌,林然,徐江 譯

買這商品的人也買了...

商品描述

這是一本關於Python的圖書,採用基於任務的方式來介紹如何在機器學習中使用Python。書中有近200個獨立的解決方案(並提供了相關代碼,讀者可以復制並粘貼這些代碼,用在自己的程序中),針對的都是數據科學家或機器學習工程師在構建模型時可能遇到的最常見任務,涵蓋最簡單的矩陣和向量運算到特徵工程以及神經網絡的構建。本書不是機器學習的入門書,適合熟悉機器學習的理論和概念的讀者擺在案頭作為參考,他們可以借鑒書中的代碼,快速解決在機器學習的日常開發中遇到的挑戰。

作者簡介

韓慧昌,畢業於北京科技大學,ThoughtWorks高級諮詢師,有多個大型企業AI項目經驗。
林然,有6年多的開發經驗、4年多Python開發經驗,在航空、零售、物流、汽車、通訊等多個行業應用過機器學習算法。
徐江,畢業於瑞典皇家理工學院的系統生物學專業,曾就職於Thoughtworks軟件技術有限公司。

目錄大綱

第1章向量、矩陣和數組.......................................... ........................... 1 
1.0簡介.................... .................................................. ...............................1 
1.1創建一個向量.............. .................................................. ......................1 
1.2創建一個矩陣....................... .................................................. .............2 
1.3創建一個稀疏矩陣............................... ................................................3 
1.4選擇元素................................................ ..............................................5 
1.5展示一個矩陣的屬性............................................... ............................6
1.6對多個元素同時應用某個操作........................................ ....................7 
1.7找到最大值和最小值...................... .................................................. ...8 
1.8計算平均值、方差和標準差..................................... ...........................9 
1.9矩陣變形................... .................................................. .......................10 
1.10轉置向量或矩陣.................... ........................................... 11 
1.11展開一個矩陣.. .................................................. ................................12 
1.12計算矩陣的秩............ .................................................. ......................13
1.13計算行列式.............................................. ..........................................14 
1.14獲取矩陣的對角線元素................................................. ....................14 
1.15計算矩陣的跡........................ .................................................. ..........15 
1.16計算特徵值和特徵向量................................ .....................................16 
1.17計算點積........ .................................................. .................................17 
1.18矩陣的相加或相減........ .................................................. ..................18 
1.19矩陣的乘法........................... .................................................. ...........19
1.20計算矩陣的逆............................................. .......................................20 
1.21生成隨機數...... .................................................. ................................21 
第2章加載數據............ .................................................. .................. 23 
2.0簡介............................. .................................................. ....................23 
2.1加載樣本數據集........................ .................................................. ......23 
2.2創建仿真數據集...................................... ..........................................25 
2.3加載CSV文件... .................................................. .............................28
2.4加載Excel文件.............................................. ...................................29 
2.5加載JSON文件.......... .................................................. .....................29 
2.6查詢SQL數據庫........................ .................................................. .....31 
第3章數據整理....................................... ......................................... 33 
3.0簡介...... .................................................. ...........................................33 
3.1創建一個數據幀. .................................................. .............................34 
3.2描述數據................. .................................................. .........................35
3.3瀏覽數據幀.............................................. ..........................................37 
3.4根據條件語句來選擇行.................................................. ...................39 
3.5替換值........................... .................................................. ..................40 
3.6重命名列........................... .................................................. ...............41 
3.7計算最小值、最大值、總和、平均值與計數值................... .............43 
3.8查找唯一值................................ .................................................. ......44 
3.9處理缺失值....................................... .................................................45
3.10刪除一列............................................... ............................................47 
3.11刪除一行.. .................................................. ........................................48 
3.12刪除重複行..... .................................................. .................................49 
3.13根據值對行分組.......... .................................................. ....................51 
3.14按時間段對行分組...................... .................................................. ....52 
3.15遍歷一個列的數據....................................... .....................................54 
3.16對一列的所有元素應用某個函數.. .................................................. ..55
3.17對所有分組應用一個函數........................................... ......................56 
3.18連接多個數據幀..................... .................................................. .........57 
3.19合併兩個數據幀.................................. ..............................................59 
第4章處理數值型數據.............................................. ........................ 63 
4.0簡介....................... .................................................. ..........................63 
4.1特徵的縮放................... .................................................. ...................63 
4.2特徵的標準化.......................... .................................................. ........65
4.3歸一化觀察值............................................ ........................................66 
4.4生成多項式和交互特徵... .................................................. ................69 
4.5轉換特徵.............................. .................................................. ............70 
4.6識別異常值................................. .................................................. .....71 
4.7處理異常值........................................ ................................................73 
4.8將特徵離散化.............................................. ......................................75 
4.9使用聚類的方式將觀察值分組. .................................................. .......77
4.10刪除帶有缺失值的觀察值......................................... ........................79 
4.11填充缺失值..................... .................................................. .................81 
第5章處理分類數據.......................... ............................................... 83 
5.0簡介.................................................. .................................................83 
5.1對nominal型分類特徵編碼........................................... ...................84 
5.2對ordinal分類特徵編碼........................ ............................................86 
5.3對特徵字典編碼.................................................. ..............................88
5.4填充缺失的分類值............................................ .................................91 
5.5處理不均衡分類........... .................................................. ...................93 
第6章處理文本......................... .................................................. ..... 97 
6.0簡介.......................................... .................................................. .......97 
6.1清洗文本....................................... .................................................. ...97 
6.2解析並清洗HTML ......................................... ...................................99 
6.3移除標點.......... .................................................. .............................. 100
6.4文本分詞............................................... ........................................... 101 
6.5刪除停止詞(stop word)......................................... 102 
6.6提取詞幹.. .................................................. ...................................... 103 
6.7標註詞性........ .................................................. ................................ 104 
6.8將文本編碼成詞袋(Bag of Words)..... ........................................... 107 
6.9按單詞的重要性加權....................................... 109 
第7章處理日期和時間.. .................................................. ................ 113
7.0簡介................................................ ................................................. 113 
7.1把字符串轉換成日期........................................... .............. 113 
7.2處理時區................................ .................................................. ........ 115 
7.3選擇日期和時間.................................... .......................................... 116 
7.4將日期數據切分成多個特徵................................................ ............ 117 
7.5計算兩個日期之間的時間差............................ ................................ 118 
7.6對一周內的各天進行編碼........ .................................................. ..... 119
7.7創建一個滯後的特徵............................................ ........... 120 
7.8使用滾動時間窗口................................. .......................................... 121 
7.9處理時間序列中的缺失值................................................. .............. 123 
第8章圖像處理.............................. ................................................ 127 
8.0簡介................................................. ................................................ 127 
8.1加載圖像................................................ .......................................... 128 
8.2保存圖像.... .................................................. .................................... 130
8.3調整圖像大小.............................................. .................................... 131 
8.4裁剪圖像.......... .................................................. .............................. 132 
8.5平滑處理圖像............... .................................................. ................. 133 
8.6圖像銳化............................ .................................................. ............ 136 
8.7提升對比度.................................. ................................ 138 
8.8顏色分離.............. .................................................. .......................... 140 
8.9圖像二值化.................. ........................ 142
8.10移除背景............................................. 144 
8.11邊緣檢測............................................... ........................................... 148 
8.12角點檢測.. ............................... 150 
8.13為機器學習創建特徵............ ..................................... 153 
8.14將顏色平均值編碼成特徵.... .................................................. ......... 156 
8.15將色彩直方圖編碼成特徵................................ ............................... 157 
第9章利用特徵提取進行特徵降維........ ........................................... 161 
9.0簡介.... .................................................. ........................................... 161
9.1使用主成分進行特徵降維.......................................... ..................... 161 
9.2對線性不可分數據進行特徵降維................... ................................. 164 
9.3通過最大化類間可分性進行特徵降維... .......................................... 166 
9.4使用矩陣分解法進行特徵降維...................................... 169 
9.5對稀疏數據進行特徵降維. .................................................. ............ 170 
第10章使用特徵選擇進行降維............................ ............................ 173 
10.0簡介................... ........................................ 173 
10.1數值型特徵方差的閾值化. ..................................... 173 
10.2二值特徵的方差閾值化.... ........................................ 175
10.3處理高度相關性的特徵.......................................... 176 
10.4刪除與分類任務不相關的特徵......................................... ............. 178 
10.5遞歸式特徵消除............................... ............................................. 180 
第11章模型評估................................................. ........................... 183 
11.0簡介.................... .................................................. 183 
11.1交叉驗證模型.......................................... 183 
11.2創建一個基準回歸模型........................................ 187 
11.3創建一個基準分類模型.................................. 188 
11.4評估二元分類器........ ........................................ 190
11.5評估二元分類器的閾值..................................... 193 
11.6評估多元分類器................................................. ......... 197 
11.7分類器性能的可視化.................................. ................................... 198 
11.8評估回歸模型.......... ................................... 201 
11.9評估聚類模型......... .................................................. . 203 
11.10創建自定義評估指標.......................................... ........................... 204 
11.11可視化訓練集規模的影響............... .............................................. 206 
11.12生成對評估指標的報告.............................................. ...... 208
11.13可視化超參數值的效果........................................... ...... 209 
第12章模型選擇...................................... ...................................... 213 
12.0簡介......... ........................................... 213 
12.1使用窮舉搜索選擇最佳模型............................................... ........... 213 
12.2使用隨機搜索選擇最佳模型.............................. ............................ 216 
12.3從多種學習算法中選擇最佳模型.......... ........ 218 
12.4將數據預處理加入模型選擇過程.............................. 220 
12.5用並行化加速模型選擇................................. 221 
12.6使用針對特定算法的方法加速模型選擇....................................... 223
12.7模型選擇後的性能評估............................ 224 
第13章線性回歸......... .................................................. ................. 227 
13.0簡介.............................. .......... 227 
13.1擬合一條直線.................................. ........ 227 
13.2處理特徵之間的影響.................................. ................................... 229 
13.3擬合非線性關係........ .................................................. .................. 231 
13.4通過正則化減少方差......................... ............................................ 233 
13.5使用套索回歸減少特徵.............................................. 235
第14章樹和森林............................................ ................................ 237 
14.0簡介............... ................ 237 
14.1訓練決策樹分類器........................... .............................................. 237 
14.2訓練決策樹回歸模型............................................... ...................... 239 
14.3可視化決策樹模型...................... .................................................. . 240 
14.4訓練隨機森林分類器.......................................... ........................... 243 
14.5訓練隨機森林回歸模型................ ............ 244 
14.6識別隨機森林中的重要特徵............................. ............................. 245
14.7選擇隨機森林中的重要特徵.......................................... ................ 248 
14.8處理不均衡的分類........................... .............................................. 249 
14.9控制決策樹的規模............................................... .......................... 250 
14.10通過boosting提高性能.................. .............................................. 252 
14.11使用袋外誤差(Out-of-Bag Error)評估隨機森林模型................ 253 
第15章KNN ............... .................................................. .................. 255 
15.0簡介............................. ...................................... 255 
15.1找到一個觀察值的最近鄰... .............................................. 255
15.2創建一個KNN分類器............................................ ....................... 258 
15.3確定最佳的鄰域點集的大小............... ........................................... 260 
15.4創建一個基於半徑的最近鄰分類器......................... 261 
第16章邏輯回歸............... .................................................. ........... 263 
16.0簡介.................................... ........................... 263 
16.1訓練二元分類器................ .................................................. .......... 263 
16.2訓練多元分類器.................................. .......................................... 265 
16.3通過正則化來減小方差............................................. 266
16.4在超大數據集上訓練分類器......................................... ................. 267 
16.5處理不均衡的分類.......................... ............................................... 269 
第17章支持向量機.............................................. ........................... 271 
17.0簡介.................... ................................................. 271 
17.1訓練一個線性分類器............................................ ......................... 271 
17.2使用核函數處理線性不可分的數據............... ...................... 274 
17.3計算預測分類的概率..................... ................................................ 278 
17.4識別支持向量............................................... ........ 279
17.5處理不均衡的分類............................................ ............................. 281 
第18章樸素貝葉斯............. .................................................. .......... 283 
18.0簡介..................................... ........................ 283 
18.1為連續的數據訓練分類器................. ............................ 284 
18.2為離散數據和計數數據訓練分類器........... .................... 286 
18.3為具有二元特徵的數據訓練樸素貝葉斯分類器.............. .............. 287 
18.4校準預測概率............................... ......... 288 
第19章聚類................................... ................................................ 291 
19.0簡介................................................. ............... 291
19.1使用K-Means聚類算法.......................................... ...................... 291 
19.2加速K-Means聚類.................... .................................................. .. 294 
19.3使用Meanshift聚類算法......................................... ...................... 295 
19.4使用DBSCAN聚類算法..................... .......................................... 296 
19.5使用層次合併聚類算法.......................................... 298 
第20章神經網絡.. .................................................. ........................ 301 
20.0簡介....................... ............................................... 301 
20.1為神經網絡預處理數據............................................. ....... 302
20.2設計一個神經網絡............................................. ............................ 304 
20.3訓練一個二元分類器.............. .................................................. ..... 307 
20.4訓練一個多元分類器...................................... ............................... 309 
20.5訓練一個回歸模型............. .................................................. .......... 311 
20.6做預測.................................... .................................................. ..... 313 
20.7可視化訓練歷史........................................ .................................... 315 
20.8通過權重調節減少過擬合..... ................................ 318 
20.9通過提前結束減少過擬合......... ............................... 320
20.10通過Dropout減少過擬合........................................... .................. 322 
20.11保存模型訓練過程.......................... ............................................... 324 
20.12使用k折交叉驗證評估神經網絡........................................... ..... 326 
20.13調校神經網絡....................................... ................................. 328 
20.14可視化神經網絡............ .................................................. .............. 331 
20.15圖像分類................................ .................................................. ..... 333 
20.16通過圖像增強來改善卷積神經網絡的性能.............................. 337
20.17文本分類............................................... ........................................ 339 
第21章保存和加載訓練後的模型................................................. .... 343 
21.0簡介........................................... ............................................ 343 
21.1保存和加載scikit -learn模型............................................... .......... 343 
21.2保存和加載Keras模型................................. ................................. 345