正则表达式求助

libraco
  • 760
170                         <tr>


171               <td align="center">13108 </td>


172               <td class="yl01">15</td><td class="yl01">4</td><td class="yl01">13</td><td class="yl01">7</td><td class="chartBall01">5</td><td class="yl01">2</td><td class="yl01">1</td><td class="yl01">5</td><td class="yl01">7</td><td class="chartBall01">10</td><td class="yl01">4</td><td class="yl01">2</td><td class="chartBall01">13</td><td class="yl01">1</td><td class="chartBall01">15</td><td class="yl01">5</td><td class="yl01">11</td><td class="chartBall01">18</td><td class="yl01">6</td><td class="yl01">12</td><td class="yl01">16</td><td class="yl01">12</td><td class="yl01">19</td><td class="yl01">8</td><td class="yl01">1</td><td class="yl01">6</td><td class="yl01">2</td><td class="yl01">9</td><td class="yl01">4</td><td class="yl01">3</td><td class="yl01">5</td><td class="yl01">7</td><td class="yl01">9</td><td class="yl01">3</td><td class="yl01">14</td><td class="yl02">3</td><td class="chartBall02">2</td><td class="yl02">2</td><td class="yl02">19</td><td class="yl02">1</td><td class="yl02">12</td><td class="yl02">15</td><td class="yl02">11</td><td class="yl02">5</td><td class="yl02">10</td><td class="chartBall02">11</td><td class="yl02">1</td>            </tr>

以上是text内容的一部分(左边的数字是行号,不在text内容中),我想匹配到它,然后我写了这样一段正则:

pattern = re.compile(r'<tr>\s*<td align="center">[\d\s]{6,}</td>\s*(<td class="(yl01|yl02|chartBall01|chartBall02)">\d+</td>){47}\s+</tr>')
local = re.findall(pattern,text)

结果匹配不到,请问是哪里出错了?

回复
阅读 2.8k
1 个回答
✓ 已被采纳

这不是能匹配到么?

>>> text='''                         <tr>
...                <td align="center">13108 </td>
...                <td class="yl01">15</td><td class="yl01">4</td><td class="yl01">13</td><td class="yl01">7</td><td class="chartBall01">5</td><td class="yl01">2</td><td class="yl01">1</td><td class="yl01">5</td><td class="yl01">7</td><td class="chartBall01">10</td><td class="yl01">4</td><td class="yl01">2</td><td class="chartBall01">13</td><td class="yl01">1</td><td class="chartBall01">15</td><td class="yl01">5</td><td class="yl01">11</td><td class="chartBall01">18</td><td class="yl01">6</td><td class="yl01">12</td><td class="yl01">16</td><td class="yl01">12</td><td class="yl01">19</td><td class="yl01">8</td><td class="yl01">1</td><td class="yl01">6</td><td class="yl01">2</td><td class="yl01">9</td><td class="yl01">4</td><td class="yl01">3</td><td class="yl01">5</td><td class="yl01">7</td><td class="yl01">9</td><td class="yl01">3</td><td class="yl01">14</td><td class="yl02">3</td><td class="chartBall02">2</td><td class="yl02">2</td><td class="yl02">19</td><td class="yl02">1</td><td class="yl02">12</td><td class="yl02">15</td><td class="yl02">11</td><td class="yl02">5</td><td class="yl02">10</td><td class="chartBall02">11</td><td class="yl02">1</td>            </tr>
... '''
>>> import re
>>>  pattern = re.compile(r'<tr>\s*<td align="center">[\d\s]{6,}</td>\s*(<td class="(yl01|yl02|chartBall01|chartBall02)">\d+</td>){47}\s+</tr>')
>>> local = re.findall(pattern,text)
>>> local
[('<td class="yl02">1</td>', 'yl02')]

虽然<td>在正则中要求重复47次,但这只是正则表达式内部的细节。可以确定的是:整个正则表达式,只匹配上了一次。所以最终只返回了括号内指定的东西的第一个的内容。

如果要取出所有47项,需要把这个(<td class="(yl01|yl02|chartBall01|chartBall02)">\d+</td>){47}外边再括一个括号。然后取出这个大项之后,再用单个小项的正则表达式去findall做匹配。

另外,别费劲心思写正则取html内容。吃力不讨好。分析html,趁早上BeautifulSoup4库。

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
宣传栏