新手上路，请多包涵

我正在寻找一种在 Python 3 中解析 http 请求的本机方法。

这个问题展示了一种在 Python 2 中执行此操作的方法，但使用现在已弃用的模块（和 Python 2），我正在寻找一种在 Python 3 中执行此操作的方法。

我主要想弄清楚请求的资源是什么，并从一个简单的请求中解析标头。（IE）：

 GET /index.html HTTP/1.1
Host: localhost
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8

有人可以告诉我解析此请求的基本方法吗？

原文由 Startec 发布，翻译遵循 CC BY-SA 4.0 许可协议

python python-3.x http

阅读 812

2 个回答

得票最新

社区维基

发布于
2022-11-16

✓ 已被采纳

您可以使用标准库中 email 模块中的 email.message.Message 类。

通过修改您链接的问题的答案，下面是一个解析 HTTP 标头的 Python3 示例。

假设您想创建一个包含所有标题字段的字典：

 import email
import pprint
from io import StringIO

request_string = 'GET / HTTP/1.1\r\nHost: localhost\r\nConnection: keep-alive\r\nCache-Control: max-age=0\r\nUpgrade-Insecure-Requests: 1\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r\nAccept-Encoding: gzip, deflate, sdch\r\nAccept-Language: en-US,en;q=0.8'

# pop the first line so we only process headers
_, headers = request_string.split('\r\n', 1)

# construct a message from the request string
message = email.message_from_file(StringIO(headers))

# construct a dictionary containing the headers
headers = dict(message.items())

# pretty-print the dictionary of headers
pprint.pprint(headers, width=160)

如果您在 python 提示符下运行它，结果将如下所示：

 {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
 'Accept-Encoding': 'gzip, deflate, sdch',
 'Accept-Language': 'en-US,en;q=0.8',
 'Cache-Control': 'max-age=0',
 'Connection': 'keep-alive',
 'Host': 'localhost',
 'Upgrade-Insecure-Requests': '1',
 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}

原文由 Corey Goldberg 发布，翻译遵循 CC BY-SA 3.0 许可协议

社区维基

发布于
2022-11-16

这些字段名称中的每一个都应先用回车符分隔，然后换行，然后字段名称和值用冒号分隔。因此，假设您已经将响应作为字符串，那么它应该很简单：

 fields = resp.split("\r\n")
fields = fields[1:] #ignore the GET / HTTP/1.1
output = {}
for field in fields:
    key,value = field.split(':')#split each line by http field name and value
    output[key] = value

更新 ⁴⁄₁₃

使用链接中的示例 http resp 发布：

 resp = 'GET /search?sourceid=chrome&ie=UTF-8&q=ergterst HTTP/1.1\r\nHost: www.google.com\r\nConnection: keep-alive\r\nA
ccept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\nUser-Agent: Mozill
a/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.45 Safari/534.
13\r\nAccept-Encoding: gzip,deflate,sdch\r\nAvail-Dictionary: GeNLY2f-\r\nAccept-Language: en-US,en;q=0.8\r\n'

fields = resp.split("\r\n")
fields = fields[1:] #ignore the GET / HTTP/1.1
output = {}
for field in fields:
    if not field:
        continue
    key,value = field.split(':')
    output[key] = value
print(output)

需要进行额外检查以确保 field 不为空。输出：

 {'Host': ' www.google.com', 'Connection': ' keep-alive', 'Accept': ' application/xml,application/xhtml+xml,text/html;q=
0.9,text/plain;q=0.8,image/png,*/*;q=0.5', 'User-Agent': ' Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US) App
leWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.45 Safari/534.13', 'Accept-Encoding': ' gzip,deflate,sdch', 'Avail-D
ictionary': ' GeNLY2f-', 'Accept-Language': ' en-US,en;q=0.8'}

原文由 Liam Kelly 发布，翻译遵循 CC BY-SA 4.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

如何在 Python 3 中解析原始 HTTP 请求？

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

如何使用 python 代码实现迅雷磁力链接资源的下载？

如何实现一个深拷贝函数？

请问，FastAPI如何获取到前端上传的二进制文件并且返回？

浏览器能请求到数据怎么换了api工具或是爬虫都没数据了呢？

Python 成员变量在多个子类实例间共享，如何避免？

Stack Overflow 翻译