背景:scrapy
介绍:
#items.jl和spider放在同一目录下。
def start_requests(self):
with open('items.jl','rb') as urls:
for url in urls:
print url
link=eval(url)
yield Request(link['url'],dont_filter=False,callback=self.parse,headers=self.headers)
items.jl文件是一个json数据:
{"url": "http://onlinelibrary.wiley.com/getIdentityKey?redirectTo=http%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1002%2Fanie.201509111%2Ffull%3Fwol1URL%3D%2Fdoi%2F10.1002%2Fanie.201509111%2Ffull&userIp=112.65.190.171&doi=10.1002%2Fanie.201509111"}
问题:
url死活打印不出来,貌似一直进不到循坏中。
补充:
#items.jl和spider放在同一目录下。
with open('items.jl','rb') as urls:
for url in urls:
print url
这样,url是能打印出来的。
.............补充(代码参考来源)...............................
http://stackoverflow.com/questions/9322219/how-to-generate-the-start-urls-dynamically-in-crawling/10379463#10379463
你这是定义了个生成器啊
要
才可以