python爬虫如何提取br标签

python爬虫怎么提取br标签上面的内容呢？也就是提取“行业中位数，支付宝”这部分内容，由于本人只学了一点html的皮毛，还没有遇到过这种br半标签的，不知道该怎么做了，我已经试了直接用copy标签的xpath，不论定位到哪里都不行。感谢回复！

<div style="position: absolute; display: none; border-style: solid; white-space: nowrap; z-index: 9999999; transition: left 0.4s cubic-bezier(0.23, 1, 0.32, 1), top 0.4s cubic-bezier(0.23, 1, 0.32, 1); background-color: rgba(50, 50, 50, 0.7); border-width: 0px; border-color: rgb(51, 51, 51); border-radius: 4px; color: rgb(255, 255, 255); font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 14px; font-family: &quot;Microsoft YaHei&quot;; line-height: 21px; padding: 5px; left: 620.518px; top: 173.333px;">
    20170712
    <br>
    行业中位数：35,326
    <br>
    支付宝：4
    <br>
</div>

python 标签 html css xpath

阅读 15k

3 个回答

得票最新

prolifes

11.2k51537

发布于
2017-08-11

✓ 已被采纳

from pyquery import PyQuery as Q

html = '''
<div style="position: absolute; display: none; border-style: solid; white-space: nowrap; z-index: 9999999; transition: left 0.4s cubic-bezier(0.23, 1, 0.32, 1), top 0.4s cubic-bezier(0.23, 1, 0.32, 1); background-color: rgba(50, 50, 50, 0.7); border-width: 0px; border-color: rgb(51, 51, 51); border-radius: 4px; color: rgb(255, 255, 255); font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 14px; font-family: &quot;Microsoft YaHei&quot;; line-height: 21px; padding: 5px; left: 620.518px; top: 173.333px;">
    20170712
    <br>
    行业中位数：35,326
    <br>
    支付宝：4
    <br>
</div>
'''
print Q(html).text()

Axton

281

发布于
2017-08-13

更新于
2017-08-13

使用BeautifulSoup或者Lxml这样的库，可以方便地定位到你需要的Div并抽取文本，然后简单处理就可以了。

Lxml版：

import lxml.html

raw_html = '网页内容'
tree = lxml.html.fromstring(raw_html)
div_obj = tree.cssselect('选择器')[0]
div_text = div_obj.text_content

这样就拿到了div里面文本的内容，自己手动过滤即可。

fin_text = div_text.split('<br>')

这样就把内容切割成了一个数组。当然需要先去除换行符。

attitude

314

发布于
2017-08-13

更新于
2017-08-13

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

相似问题

找不到问题？创建新问题

python爬虫如何提取br标签

你尚未登录，登录后可以

纯css如何绘制一个无背景色有边框色，带有文字的倒等腰梯形？

Qt中布局是否只有5种呢？

css布局怎么保证右边一直在可视范围内？

这段代码为什么不能获取到数据？

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

请问一下，如何理解reduce函数呢？

如何使用Python+Selenium爬取Goodreads上万条书评而不崩溃？