我用 pyspider 想爬取 51job 上的招聘信息,在控制台代码页 run 验证的时候输出是正确的,但是回到控制台 run 之后 results
里面就没有结果,这样的情况一直出现,麻烦各位帮我看一下。
代码:
from pyspider.libs.base_handler import *
class Handler(BaseHandler):
crawl_config = {
}
@every(minutes=24 * 60)
def on_start(self):
self.crawl('http://jobs.51job.com/', callback=self.index_page, validate_cert=False, age=0)
@config(age=10 * 24 * 60 * 60)
def index_page(self, response):
for each in response.doc('.e5 .lkst a').items():
self.crawl(each.attr.href, callback=self.detail_page, validate_cert=False, age=0)
@config(priority=2)
def detail_page(self, response):
for each in response.doc('.e .info .title a').items():
self.crawl(each.attr.href, callback=self.detail_page_next, validate_cert=False, age=0,retries=3)
for each in response.doc('.bk a').items():
print "deep"
self.crawl(each.attr.href, callback=self.detail_page, validate_cert=False, age=0)
@config(priority=1)
def detail_page_next(self, response):
return {
"公司":response.doc('.cname').text(),
"公司规模":response.doc('.ltype').text(),
"职位":response.doc('h1').text(),
"薪资":response.doc('.cn strong').text(),
"描述":response.doc('.job_msg').text(),
"地点":response.doc('.lname').text(),
}
代码页验证正确:
控制台:
results:
试试下面的脚本,设置detail_page为priority=2会让结果更早出现