py正则，两个条件怎么匹配

发布于
2017-08-10

更新于
2017-08-10

新手上路，请多包涵

需要提取部分的所在网页代码

"color:#ff0000">均价：</span></strong></td>
<td style="border-bottom:#000 1px solid;border-left:#000;background-color:transparent;width:131px;border-top:#000;border-right:#000 1px solid" class="xl70" width="130"><strong><span style="color:#ff0000">12862</span></stong></td>
<td style="border-bottom:#000 1px solid;border-left:#000;background-color:transparent;width:83px;border-top:#000;border-right:#000 1px solid" class="xl77" width="82"><strong><span style="color:#ff0000">+240</span>

我想提取的数是 12862和+240。单次提取用

pattern = r'均价：<.*?00;">(.*?)</span.*?00;">(.*?)</span'

没问题的。

现在需要批量提取一系列类似网页。
但是这个网站的代码不规范，有时候《strong>在<span>里面，有时候在外面。

请问各位老师，这该怎么弄pattern？

python

阅读 3.4k

2 个回答

得票最新

wawor4827

32716

发布于
2017-08-10

这种不规则的，利用 BeautifulSoup ，图片描述

# #coding=utf-8

from bs4 import BeautifulSoup



html = """
color:#ff0000">均价：</span></strong></td>
<td style="border-bottom:#000 1px solid;border-left:#000;background-color:transparent;width:131px;border-top:#000;border-right:#000 1px solid" class="xl70" width="130"><strong><span style="color:#ff0000">12862</span></stong></td>
<td style="border-bottom:#000 1px solid;border-left:#000;background-color:transparent;width:83px;border-top:#000;border-right:#000 1px solid" class="xl77" width="82"><strong><span style="color:#ff0000">+240</span>
"""


soup = BeautifulSoup(html,"lxml")

span = soup.select("span")

for s in span:
    print s.get_text()