xueqiu的数据无法抓取

Question

xueqiu的数据无法抓取

发布于
2021-02-01

发现雪球的网站厉害了，用python + selenium无法抓取，然后听说了一个大神器 python + mitmproxy + chrome浏览器，结果发现，还是会被ban,我想知道雪球是如何探测chrome被程序操控的呢？

cat  addons.py
import mitmproxy.http
from mitmproxy import ctx

url_paths = '/s/?page='
class Jobinfo:
    def response(self, flow: mitmproxy.http.HTTPFlow):
        if flow.request.path.startswith(url_paths):
            text = flow.response.get_text()
            file_handle=open('target.txt',mode='a')
            file_handle.write(text)
            file_handle.write('\n')
            file_handle.write('\n')
            file_handle.close()
        return    
addons = [Jobinfo()]

开启

mitmdump  -s addons.py

开启浏览器

google-chrome  --proxy-server=127.0.0.1:8080 --ignore-certificate-errors

下面的链接打开后
https://xueqiu.com/s/?page=1&...

target.txt没有需要的那个表格数据，都是其他的东东

python爬虫

阅读 1.5k

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

相似问题

找不到问题？创建新问题