Python網絡爬蟲

耿興隆,胡鐘月,周祥

  • 出版商: 電子工業
  • 出版日期: 2023-03-01
  • 定價: $294
  • 售價: 8.5$250
  • 語言: 簡體中文
  • 頁數: 220
  • ISBN: 7121438100
  • ISBN-13: 9787121438103
  • 相關分類: Web-crawler 網路爬蟲
  • 下單後立即進貨 (約4週~6週)

商品描述

本書介紹如何結合Python進行網絡爬蟲程序的開發,從Python語言的基本特性入手,詳細介紹了Python網絡爬蟲開發的各個方面,涉及HTTP、HTML、JavaScript、正則表達式、自然語言處理、數據科學等不同領域的內容。全書共10章,包括Python基礎知識、網站分析、網頁解析、Python文件讀寫、Python與數據庫、AJAX技術、模擬登錄、文本與數據分析、網站測試、Scrapy爬蟲框架、爬蟲性能等多個主題。本書可作為高等職業院校電腦類專業的專業課教材,也可供電腦相關從業人員選用參考。

目錄大綱

目錄
項目一 Python 基礎認知 ····················································································.1
任務一 Python 概述 ·······································································································.1
一、Python 簡介 ······································································································.1
二、安裝Python ······································································································.2
三、安裝PyCharm ···································································································.6
四、Python 語法規範 ·······························································································.11
任務二 Python 命令的組成 ·····························································································.13
一、基本符號 ·········································································································.14
二、常量與變量 ······································································································.16
三、數據類型 ·········································································································.19
四、功能符號 ·········································································································.24
任務三 程序結構 ·········································································································.26
一、表達式語句 ······································································································.26
二、順序結構 ·········································································································.27
三、選擇結構 ·········································································································.28
四、循環結構 ·········································································································.30
五、條件表達式 ······································································································.31
六、程序的流程控制 ································································································.32
項目實戰 ·····················································································································.33
實戰 輸出百度網址 ································································································.33
項目二 網絡爬蟲基礎認知 ················································································.35
任務一 網絡爬蟲概述 ···································································································.35
一、網絡爬蟲的基本原理 ··························································································.36
二、網絡爬蟲系統框架 ·····························································································.37
三、爬行策略 ·········································································································.37
四、網絡爬蟲的分類 ································································································.38
五、開源網絡爬蟲框架/項目 ······················································································.39
任務二 HTTP ·············································································································.41
一、HTTP 的工作原理 ·····························································································.41
二、Urllib 模塊庫 ···································································································.42
三、URL 定義 ·······································································································.43
四、URL 編碼設置 ·································································································.47
任務三 網頁請求過程 ···································································································.50
一、發送請求報文 ··································································································.51
二、返回響應 ········································································································.52
三、HTTP 消息 ······································································································.53
項目實戰 ·····················································································································.54
實戰一 搜索商品網址 ····························································································.54
實戰二 搜索食品價格網址 ······················································································.56
項目三 Urllib 請求模塊庫的應用 ········································································.58
任務一 發送網頁請求 ···································································································.58
一、基本HTTP 請求 ·······························································································.58
二、Request 網絡請求 ·····························································································.66
三、設置請求頭 ·····································································································.67
四、Handler 方法發送請求 ·······················································································.69
五、設置代理IP ····································································································.71
六、身份驗證 ········································································································.73
任務二 網頁下載 ·········································································································.77
一、網頁結構 ········································································································.77
二、寫入網頁文件 ··································································································.77
三、網頁文件下載 ··································································································.79
項目實戰 ·····················································································································.82
實戰一 下載Python 學習網址 ··················································································.82
實戰二 下載公司網頁HTML 文件 ············································································.85
項目四 安裝Urllib3 請求模塊庫並發送請求 ··························································.87
任務一 安裝Urllib3 請求模塊庫 ······················································································.87
一、安裝Anaconda ·································································································.87
二、安裝Urllib3 模塊庫 ···························································································.92
任務二 發送請求 ·········································································································.95
一、創建代理對象 ··································································································.96
二、請求方法 ········································································································.98
三、定義請求頭 ·····································································································.99
四、設置代理IP ···································································································.101
五、自動重試 ·······································································································.102
六、重定向 ··········································································································.103
項目實戰 ····················································································································.104
實戰 發送請求訪問淘寶 ························································································.104
項目五 Requests 請求模塊庫的應用 ·································································.106
任務一 網頁請求 ·······································································································.106
一、標準的HTTP 請求 ···························································································.107
二、返回響應消息 ·································································································.109
三、JSON 格式數據 ·······························································································.114
任務二 發送請求方法 ·································································································.117
一、發送GET 請求方法 ·························································································.118
二、發送POST 請求方法 ························································································.120
三、其他請求方法 ·································································································.125
任務三 復雜網絡請求 ·································································································.126
一、復雜請求頭 ····································································································.126
二、上傳文件 ·······································································································.129
三、Cookies 驗證 ··································································································.131
四、會話保持 ·······································································································.131
任務四 異常處理 ·······································································································.133
一、try-except 語句 ································································································.133
二、Urllib 異常處理模塊 ·························································································.134
三、Urllib3 異常處理模塊 ·······················································································.135
四、request 異常處理模塊 ·······················································································.135
項目實戰 ···················································································································.138
實戰 爬取豆瓣最受歡迎的影評網址 ·········································································.138
項目六 解析網頁 ···························································································.141
任務一 正則表達式解析網頁 ························································································.141
一、正則表達式模式 ······························································································.142
二、使用re 模塊實現正則表達式 ··············································································.143
三、字符串查找 ····································································································.144
四、字符串替換 ····································································································.148
五、字符串分割 ····································································································.149
任務二 XPath 解析網頁 ·······························································································.150
一、XPath 概述 ····································································································.150
二、XPath 網頁解析 ······························································································.152
三、獲取節點信息 ·································································································.154
四、節點關系 ·······································································································.160
五、查找節點信息 ·································································································.162
六、屬性節點 ·······································································································.163
七、XPath 運算符 ·································································································.165
八、XML 節點軸 ··································································································.168
任務三 BeautifulSoup 解析網頁 ······················································································.170
一、安裝BeautifulSoup ···························································································.171
二、創建BeautifulSoup 對象 ····················································································.171
三、通過屬性獲取節點內容 ·····················································································.173
四、根據節點關系獲取節點 ·····················································································.176
五、查找節點內容 ·································································································.178
六、通過CSS 選擇器查找節點內容 ···········································································.182
項目實戰 ····················································································································.183
實戰一 獲取查詢網中河北省石家莊市的郵編區號 ·······················································.183
實戰二 爬取銷售熱門圖書名稱 ···············································································.186
實戰三 下載銷售熱門圖書的圖片 ············································································.188
項目七 Scrapy 網絡爬蟲框架 ···········································································.190
任務一 Scrapy 網絡爬蟲框架基礎認知 ·············································································.190
一、Scrapy 網絡爬蟲框架基礎 ··················································································.190
二、Scrapy 常用命令 ······························································································.192
三、創建Scrapy 項目 ·····························································································.193
任務二 使用模板創建Spider 文件 ··················································································.194
一、創建網絡爬蟲文件命令 ·····················································································.195
二、創建basic 模板文件 ·························································································.196
三、創建crawl 模板文件 ·························································································.197
四、創建csvfeed 模板文件 ······················································································.198
五、創建xmlfeed 模板文件 ······················································································.198
任務三 Scrapy 網絡爬蟲文件 ·························································································.199
一、Spider 類 ·······································································································.199
二、配置網絡爬蟲 ·································································································.201
三、啟動網絡爬蟲 ·································································································.202
四、提取數據 ·······································································································.207
項目實戰 ····················································································································.209
實戰 提取景區名稱 ······························································································.209