例如有10个url为:
http://www.baidu.com/userid=1
http://www.baidu.com/userid=2
http://www.baidu.com/userid=3
...
http://www.baidu.com/userid=10
网页内容为
{
"data": {
"1": {
"uid": "1",
"phone": "13000000000",
"sex": "1"
}
},
"code": 1,
"msg": "1"
}
{
"data": {
"2": {
"uid": "2",
"phone": "13000000001",
"sex": "1"
}
},
"code": 1,
"msg": "1"
}
初学pyspider查了很多资料还没上手,查到一个方法可以列出所有url但是不知道怎么抓里面的数据,麻烦大家帮小弟解惑谢谢!!!
def __init__(self):
self.base_url = 'http://www.baidu.com/userid='
self.uid_num = 1
self.total_num = 10
@every(minutes=24 * 60)
def on_start(self):
while self.uid_num <= self.total_num:
url = self.base_url + str(self.uid_num)
print url
self.crawl(url, callback=self.index_page)
self.uid_num += 1