【求教】：如何用python爬取网络中tr，td以下的内容

Question

【求教】：如何用python爬取网络中tr，td以下的内容

zhouxinyan

1113

发布于
2018-05-19

更新于
2018-05-21

新手上路，请多包涵

请教各位大神，我要怎么爬出这个网页html中灰色部分最后的2.1610这个数字？
图片描述

并且我有一系列和这个网页具有高度相似html的网页，我想要爬出同样位置的这一串数字，我应该如何利用beautifulsoup完成我的代码？

现我的代码如下（注释部分采用了第一位回答者的代码）：

def getLinks(articleUrl):
    html=urlopen(articleUrl)
    #s = '<tr><td><b><a href=".././statistics/power" title="Exponent of the power-law degree distibution">Power law exponent (estimated) with d<sub>min</sub></a></b></td><td>2.1610(d<sub>min</sub> = 2) </td></tr>'
    #soup = BeautifulSoup(s, 'html.parser')
    #print(soup.find_all('td')[1].contents[0][:-2])

网页爬虫 beautifulsoup

python

阅读 14k

1 个回答

vibiu

Python 的网页解析一般有以下方法:
1.字符串方法
2.正则表达式
3.html/xml文本解析库的调用(如著名的BeautifulSoup库)
对于你所给的例子, 假设:

>>> s = '<tr><td><b><a href=".././statistics/power" title="Exponent of the power-law degree distibution">Power law exponent (estimated) with d<sub>min</sub></a></b></td><td>2.1610(d<sub>min</sub> = 2) </td></tr>'

由于文本特征非常明显, 可以这样处理:
1.字符串处理方法:

>>> s.split('<td>')[-1].split('(d')[0]
'2.1610'

2.re:

>>> import re
>>> pattern = re.compile('</b></td><td>(.*)\(d<sub>')
>>> pattern.findall(s)
['2.1610']

3.BeautifulSoup:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(s, 'html.parser')
>>> soup.find_all('td')[1].contents[0][:-2]
'2.1610'

以上方法均是根据给定的例子临时设计的.

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

相似问题

找不到问题？创建新问题

【求教】：如何用python爬取网络中tr，td以下的内容

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

如何使用 python 代码实现迅雷磁力链接资源的下载？

如何实现一个深拷贝函数？

请问，FastAPI如何获取到前端上传的二进制文件并且返回？

浏览器能请求到数据怎么换了api工具或是爬虫都没数据了呢？

Python 成员变量在多个子类实例间共享，如何避免？