python爬虫 Day 2

爬虫网络请求模块Urllib上

urllib模块

1.urllib模块是什么？
（1）是python内置的网络请求模块，例如re，time模块
（2）第三方模块有如：requests，scrapy等

2.为什么要学习urllib模块？
（1）对比学习第三方模块requests
（2）部分爬虫项目需要使用urllib模块
（3）有时候urllib+requests模块配合使用更简洁

urllib快速入门

1.urllib.request的使用
（1）urllib.request.urlopen('网站')
（2）urllib.request.urlopen(请求对象)

   a.创建一个请求对象 构建UA
   b.获取响应对象 通过urlopen()
   c.获取响应对象的内容 read().decode('utf-8')

附注（响应对象）：

  print(res.getcode()) # 获取状态码
  print(res.geturl())  # 获取请求的url地址

2.urllib.parse的使用——Day 3

代码：

（1）代码UserAgent：主要目的还是为了防止被检测到是机器爬虫，一般是反反爬的第一步

 import requests
 url = 'https://www.baidu.com/'
 headers = {
     'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
 }
 res = requests.get(url,headers=headers)
 # print(res)
 header = res.request.headers
 print(header)

（2）代码urllib_request

 import urllib.request
 header={
      'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36',
 }
 url = 'https://www.baidu.com/'
 # response响应对象
 response = urllib.request.urlopen('https://www.baidu.com/')
 
 # 1.打印的数据是字节流 数据类型
 # read()方法把响应对象里面的内容提取出来
 # type()查看数据类型
 print(response.read().decode('utf-8'),type(response.read().decode('utf-8')))

 # 2.数据不对
（1）创建一个请求对象 构建UA
 req = urllib.request.Request(url, headers=header)
 # urlopen()方法可以实现最基本的请求的发起，但如果要加入Headers等信息，就可以利用Request类来构造请求
（2）获取响应对象 通过urlopen()
 res = urllib.request.urlopen(req)
（3）获取响应对象的内容 read().decode('utf-8')
 print(res.read().decode('utf-8'),type(res.read().decode('utf-8')))

（3）代码pictures

import requests
from urllib import request

url = 'http://shp.qpic.cn/ishow/2735061617/1623836895_84828260_11275_sProdImgNo_2.jpg/0'

# 第一种方式
req = requests.get(url)
print(req)
f = open('code_img1.png','wb')
f.write(req.content)

# 第二种方式
req = requests.get(url)
with open('code-img2.png','wb') as f:
f.write(req.content)

# 第三种方式
request.urlretrieve(url,'code_img3.png')

python爬虫 Day 2

爬虫网络请求模块Urllib上

urllib模块

urllib快速入门

代码：

国民好姐姐

引用和评论

python爬虫 Day 9

chrome浏览器二次开发和chromium源码编译官方教程中文版

xhs_search_comment_tool | 2025自研小红书评论区数据采集工具

douyin_search_comment_tool | 2025自研python软件采集抖音评论区数据

xhs笔记详情，小红书笔记用户，小红书API接口技术交流

【GUI软件】调用YouTube的API接口，采集关键词搜索结果，并封装成界面工具！

深入研究：淘宝天猫商品详情查询API详解