python3爬百度贴吧遇到ajax渲染和无限加载的困境?

希望得到解决的问题：
一、如何通过requests获取目标地址的楼层号信息。
二、采用selenium+chrome方式获取源码，如何解决无限加载困境，并获得含楼层号信息的源码
请亲测后回答，不要纸上谈兵。

目标地址：
https://tieba.baidu.com/p/558...
目标任务：采集目标地址所在贴的楼层号与昵称保存为字典以供使用。
采用requests方式请求到的源码中不包含楼层号（1楼，2楼，5楼等）信息，此为任务需要采集的字段，如何获取到带楼层号信息的源码？估计该地址使用了ajax渲染，直接requests获取不到，
而同样是百度贴吧，此贴（https://tieba.baidu.com/p/483...）则能使用requests请求到含楼层号信息的源码，为了使脚本更具兼容性，只好使用selenium+chrome方式，结果在get该地址时出现无限加载的困境，试图采用如下方式来停止页面无限加载以继续运行代码：

from selenium import webdriver
from selenium.common.exceptions import TimeoutException

driver = webdriver.Chrome()
# 设定页面加载限制时间
driver.set_page_load_timeout(5)
driver.maximize_window()

try:
    driver.get('http://tieba.baidu.com/p/5580899300')
except:  
    print('time out after 5 seconds when loading page')
    driver.execute_script('window.stop()')
print(driver.page_source)
driver.quit()

结果执行到下面语句时出错

driver.execute_script('window.stop()')

错误信息如下：

time out after 5 seconds when loading page
Traceback (most recent call last):  File "C:\Users\Administrator\.vscode\extensions\ms-python.python-2018.1.0\pythonFiles\PythonTools\visualstudio_py_launcher_nodebug.py", line 74, in run
    _vspu.exec_file(file, globals_obj)
  File "C:\Users\Administrator\.vscode\extensions\ms-python.python-2018.1.0\pythonFiles\PythonTools\visualstudio_py_util.py", line 119, in exec_file
    exec_code(code, file, global_variables)
  File "C:\Users\Administrator\.vscode\extensions\ms-python.python-2018.1.0\pythonFiles\PythonTools\visualstudio_py_util.py", line 95, in exec_code
    exec(code_obj, global_variables)
  File "d:\XXX\XXX\sf.py", line 13, in <module>
    driver.execute_script('window.stop()')
  File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 627, in execute_script
    'args': converted_args})['value']
  File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 312, in execute
    self.error_handler.check_response(response)
  File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: timeout
  (Session info: chrome=64.0.3282.140)
  (Driver info: chromedriver=2.35.528161 (5b82f2d2aae0ca24b877009200ced9065a772e73),platform=Windows NT 6.1.7601 SP1 x86_64)

阅读 2.9k

# coding: utf-8 from __future__ import unicode_literals import requests import json from pyquery import PyQuery as Q r = requests.get('https://tieba.baidu.com/p/5580899300') for _ in Q(r.text)('div.l_post_bright'): d = json.loads(Q(_).attr('data-field')) print d['author']['user_name'], d['content']['post_no']

python3爬百度贴吧遇到ajax渲染和无限加载的困境?

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Spark-TTS-0.5B 的 requirements.txt 在哪里？