正则表达式抓取

>>> str=''' <td>
...                                     
...                                         
...                                         应用推广
...                                     
...                                 </td>
...                                 <td>
...                                     
...                                         大图广告
...                                         
...                                         
...                                         
...                                     
...                                 </td>
...                                 <td>
...                                     信息流大图D16
...                                 </td>'''


>>> s=re.search('<td>.*?</td>.*?<td>.*?</td>.*?<td>(.*?)</td>',str,re.S).group(1)
>>> a
Traceback (most recent call last):
  File "<input>", line 1, in <module>
NameError: name 'a' is not defined
>>> s
'\n\t\t\t\t\t\t\t\t\t\xe4\xbf\xa1\xe6\x81\xaf\xe6\xb5\x81\xe5\xa4\xa7\xe5\x9b\xbeD16\n\t\t\t\t\t\t\t\t'
>>> s.strip(" ")
'\n\t\t\t\t\t\t\t\t\t\xe4\xbf\xa1\xe6\x81\xaf\xe6\xb5\x81\xe5\xa4\xa7\xe5\x9b\xbeD16\n\t\t\t\t\t\t\t\t'

正则如何匹配里面 ”信息流大图D16”不要其他的空格t n ?

阅读 1.9k
2 个回答

用re的sub替换就好

In [9]: a = u'ntttttttttxe4xbfxa1xe6x81xafxe6xb5x81xe5xa4xa7xe5x9bxbeD16ntttttttt'

In [10]: import re

In [11]: a1 = re.sub("\t","",a)

In [12]: a2 = re.sub("\n","",a1)

新手上路,请多包涵

re.search('(信)(.*?)(6)',str).group()