这是我的代码,但是我的正则表达式还是有问题,求解
#coding=utf-8
import urllib2;
import re;
def getProvince(mainUrl):
req = urllib2.Request(mainUrl);
resp = urllib2.urlopen(req);
respHtml = resp.read();
# print "respHtml",respHtml;
#<a href="/lelist/listxian.aspx?id=D44C1502B7D5BEA1" class="cunpaddingl4">安徽</a>
#re.search('<h1\s+?class="h1user">(?P<h1user>.+?)</h1>', respHtml);
foundA_lable = re.search('<a\s+?class=cunpaddingl4>(?P<cunpaddingl4>.+?)</a>',respHtml);
print "foundA_lable =",foundA_lable;
if foundA_lable:
province = foundA_lable.group("cunpaddingl4");
print u"cunpaddingl4 =",province;
else :
print u"没有匹配到数据";
print getProvince("http://www.yigecun.com/");
稍微改了一下你的代码,主要是改为用
re.findall()
,返回一个list
,然后循环遍历打印。用
bs4
也方便实现。