目标网页 https://www.bse.cn/nq/listedcompany.html
第一页数据的post请求:
import requests
url = "https://www.bse.cn/nqxxController/nqxxCnzq.do?callback=jQuery331_1678283984630"
payload = {
"page": "0",
"typejb": "T",
"xxfcbj[]": "2",
"xxzqdm": "",
"sortfield": "xxzqdm",
"sorttype": "asc",
}
r = requests.post(url, data=payload)
data_text = r.text
上面的代码为何得不到数据?
data_text
'<html>\r\n<head><title>403 Forbidden</title></head>\r\n<body>\r\n<center><h1>403 Forbidden</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n'
使用curl,由于输出的是二进制数据,因此保存到文件
curl 'https://www.bse.cn/nqxxController/nqxxCnzq.do?callback=jQuery331_1678285734365' -X POST -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/110.0' -H 'Accept: text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01' -H 'Accept-Language: en-GB,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br' -H 'Content-Type: application/x-www-form-urlencoded; charset=UTF-8' -H 'X-Requested-With: XMLHttpRequest' -H 'Origin: https://www.bse.cn' -H 'Connection: keep-alive' -H 'Referer: https://www.bse.cn/nq/listedcompany.html' -H 'Cookie: Hm_lvt_ef6193a308904a92936b38108b93bd7f=1678283985; Hm_lpvt_ef6193a308904a92936b38108b93bd7f=1678285736' -H 'Sec-Fetch-Dest: empty' -H 'Sec-Fetch-Mode: cors' -H 'Sec-Fetch-Site: same-origin' --data-raw 'page=1&typejb=T&xxfcbj%5B%5D=2&xxzqdm=&sortfield=xxzqdm&sorttype=asc' -o /tmp/test.data
可以获得数据,但如何解析这些数据呢?