企查查无法用selenium 无头浏览器搜索 https://www.qichacha.com/

企查查无头浏览器无法搜索

3 个回答

得票最新

rockswang

1.4k41023

发布于
2018-06-08

可能有反爬虫手段，selenium还是有些特征的，比如全局对象中会有一些特殊属性。

vibiu

1.1k1616

发布于
2018-06-08

更新于
2018-06-08

看你为了爬企查查已经问了不少问题了, 这里我提醒一下:
如果使用 ChromeDriver 的无头模式, 那么访问网站的时候不能有通过document.write()插入的 js脚本被执行. 参考stackoverflow 上的一个问题:
例子:

>>> from selenium import webdriver
>>> option = webdriver.ChromeOptions()
>>> option.add_argument('--headless')
>>> driver = webdriver.Chrome(chrome_options=option)
[0608/163830.206:ERROR:gpu_process_transport_factory.cc(1007)] Lost UI shared context.

DevTools listening on ws://127.0.0.1:60357/devtools/browser/36a1f861-d1ab-4cef-a5a9-3072bbada0fc
>>> driver.get('https://www.baidu.com')
[0608/163849.677:INFO:CONSOLE(715)] "A parser-blocking, cross site (i.e. different eTLD+1) script, https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/protocol/https/global/js/all_async_search_8d20902.js, is invoked via document.write. The network request for this script MAY be blocked by the browser in this or a future page load due to poor network connectivity. If blocked in this page load, it will be confirmed in a subsequent console message. See https://www.chromestatus.com/feature/5718547946799104 for more details.", source: https://www.baidu.com/ (715)

这里 https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/protocol/https/global/js/all_async_search_8d20902.js 是通过 document.write()写入 html 文本中然后再加载的, 不会被执行, 因此报错.

但是 Firefox 没有这个问题, 所以我推荐你使用 Firefox 的无头模式, 或者 phantomjs 这个无头浏览器.
Firefox例子:

from selenium import webdriver
option = webdriver.FirefoxOptions()
option.add_argument('--headless')
driver = webdriver.Firefox(firefox_options=option)
driver.get('https://www.qichacha.com')
# 业务代码...

当然, 使用 Firefox 之前需要安装 Firefox.