希望得到解决的问题:
一、如何通过requests获取目标地址的楼层号信息。
二、采用selenium+chrome方式获取源码,如何解决无限加载困境,并获得含楼层号信息的源码
请亲测后回答,不要纸上谈兵。
目标地址:
https://tieba.baidu.com/p/558...
目标任务:采集目标地址所在贴的楼层号与昵称保存为字典以供使用。
采用requests方式请求到的源码中不包含楼层号(1楼,2楼,5楼等)信息,此为任务需要采集的字段,如何获取到带楼层号信息的源码?估计该地址使用了ajax渲染,直接requests获取不到,
而同样是百度贴吧,此贴(https://tieba.baidu.com/p/483...)则能使用requests请求到含楼层号信息的源码,为了使脚本更具兼容性,只好使用selenium+chrome方式,结果在get该地址时出现无限加载的困境,试图采用如下方式来停止页面无限加载以继续运行代码:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
# 设定页面加载限制时间
driver.set_page_load_timeout(5)
driver.maximize_window()
try:
driver.get('http://tieba.baidu.com/p/5580899300')
except:
print('time out after 5 seconds when loading page')
driver.execute_script('window.stop()')
print(driver.page_source)
driver.quit()
结果执行到下面语句时出错
driver.execute_script('window.stop()')
错误信息如下:
time out after 5 seconds when loading page
Traceback (most recent call last): File "C:\Users\Administrator\.vscode\extensions\ms-python.python-2018.1.0\pythonFiles\PythonTools\visualstudio_py_launcher_nodebug.py", line 74, in run
_vspu.exec_file(file, globals_obj)
File "C:\Users\Administrator\.vscode\extensions\ms-python.python-2018.1.0\pythonFiles\PythonTools\visualstudio_py_util.py", line 119, in exec_file
exec_code(code, file, global_variables)
File "C:\Users\Administrator\.vscode\extensions\ms-python.python-2018.1.0\pythonFiles\PythonTools\visualstudio_py_util.py", line 95, in exec_code
exec(code_obj, global_variables)
File "d:\XXX\XXX\sf.py", line 13, in <module>
driver.execute_script('window.stop()')
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 627, in execute_script
'args': converted_args})['value']
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 312, in execute
self.error_handler.check_response(response)
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: timeout
(Session info: chrome=64.0.3282.140)
(Driver info: chromedriver=2.35.528161 (5b82f2d2aae0ca24b877009200ced9065a772e73),platform=Windows NT 6.1.7601 SP1 x86_64)
有的时候找错方向也是挺悲催的(亲测,没有纸上谈兵)