>>> str=''' <td>
...
...
... 应用推广
...
... </td>
... <td>
...
... 大图广告
...
...
...
...
... </td>
... <td>
... 信息流大图D16
... </td>'''
>>> s=re.search('<td>.*?</td>.*?<td>.*?</td>.*?<td>(.*?)</td>',str,re.S).group(1)
>>> a
Traceback (most recent call last):
File "<input>", line 1, in <module>
NameError: name 'a' is not defined
>>> s
'\n\t\t\t\t\t\t\t\t\t\xe4\xbf\xa1\xe6\x81\xaf\xe6\xb5\x81\xe5\xa4\xa7\xe5\x9b\xbeD16\n\t\t\t\t\t\t\t\t'
>>> s.strip(" ")
'\n\t\t\t\t\t\t\t\t\t\xe4\xbf\xa1\xe6\x81\xaf\xe6\xb5\x81\xe5\xa4\xa7\xe5\x9b\xbeD16\n\t\t\t\t\t\t\t\t'
正则如何匹配里面 ”信息流大图D16”不要其他的空格t n ?
用re的sub替换就好
In [9]: a = u'ntttttttttxe4xbfxa1xe6x81xafxe6xb5x81xe5xa4xa7xe5x9bxbeD16ntttttttt'
In [10]: import re
In [11]: a1 = re.sub("\t","",a)
In [12]: a2 = re.sub("\n","",a1)