BeautifulSoup 返回 None 即使该元素存在

Question

新手上路，请多包涵

我已经浏览了针对类似问题的大部分解决方案，但没有找到一个有效的解决方案，更重要的是，还没有找到解释为什么在被抓取的网站上调用 Javascript 或其他东西时会发生这种情况。

我正在尝试从网站上抓取游戏“Officials”的表格：http: //www.pro-football-reference.com/boxscores/201309050den.htm

我的代码是：

 url = "http://www.pro-football-reference.com/boxscores/201309050den.htm"
html = urlopen(url)
bsObj = BeautifulSoup(html, "lxml")
officials = bsObj.findAll("table",{"id":"officials"})

for entry in officials:
    print(str(entry))

我现在只是打印到控制台，但是我得到一个带有 findAll 或 None 的空列表。我也用基本的 html.parser 尝试过这个，但没有成功。

对 html 有更好理解的人可以具体告诉我这个网页有什么不同吗？提前致谢！

原文由 scotche 发布，翻译遵循 CC BY-SA 4.0 许可协议

python web-scraping beautifulsoup

阅读 423

1 个回答

得票最新

社区维基

1

发布于
2023-01-08

试试这个代码：

 from selenium import webdriver
import time
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
url= "http://www.pro-football-reference.com/boxscores/201309050den.htm"
driver.maximize_window()
driver.get(url)

time.sleep(5)
content = driver.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content,"html.parser")
officials = soup.findAll("table",{"id":"officials"})

for entry in officials:
    print(str(entry))

driver.quit()

它将打印：

 <table class="suppress_all sortable stats_table now_sortable" data-cols-to-freeze="0" id="officials"><thead><tr class="thead onecell"><td class=" center" colspan="2" data-stat="onecell">Officials</td></tr></thead><caption>Officials Table</caption><tbody>
<tr data-row="0"><th class=" " data-stat="ref_pos" scope="row">Referee</th><td class=" " data-stat="name"><a href="/officials/ColeWa0r.htm">Walt Coleman</a></td></tr>
<tr data-row="1"><th class=" " data-stat="ref_pos" scope="row">Umpire</th><td class=" " data-stat="name"><a href="/officials/ElliRo0r.htm">Roy Ellison</a></td></tr>
<tr data-row="2"><th class=" " data-stat="ref_pos" scope="row">Head Linesman</th><td class=" " data-stat="name"><a href="/officials/BergJe1r.htm">Jerry Bergman</a></td></tr>
<tr data-row="3"><th class=" " data-stat="ref_pos" scope="row">Field Judge</th><td class=" " data-stat="name"><a href="/officials/GautGr0r.htm">Greg Gautreaux</a></td></tr>
<tr data-row="4"><th class=" " data-stat="ref_pos" scope="row">Back Judge</th><td class=" " data-stat="name"><a href="/officials/YettGr0r.htm">Greg Yette</a></td></tr>
<tr data-row="5"><th class=" " data-stat="ref_pos" scope="row">Side Judge</th><td class=" " data-stat="name"><a href="/officials/PattRi0r.htm">Rick Patterson</a></td></tr>
<tr data-row="6"><th class=" " data-stat="ref_pos" scope="row">Line Judge</th><td class=" " data-stat="name"><a href="/officials/BaynRu0r.htm">Rusty Baynes</a></td></tr>
</tbody></table>

原文由 thebadguy 发布，翻译遵循 CC BY-SA 3.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

Stack Overflow 翻译

子站问答

访问

本篇内容翻译自 Stack Overflow，如果你觉得翻译结果值得改进，欢迎直接编辑修改，感谢你为社区贡献。

相似问题

找不到问题？创建新问题

BeautifulSoup 返回 None 即使该元素存在

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Spark-TTS-0.5B 的 requirements.txt 在哪里？

Stack Overflow 翻译