为什么我使用pyspider框架进行爬虫，但是results里没有结果？

Question

为什么我使用pyspider框架进行爬虫，但是results里没有结果？

YHYHYHHH

1036

发布于
2018-01-21

dokelung

4.9k1516

更新于
2018-01-21

我用 pyspider 想爬取 51job 上的招聘信息，在控制台代码页 run 验证的时候输出是正确的，但是回到控制台 run 之后 results 里面就没有结果，这样的情况一直出现，麻烦各位帮我看一下。

代码：

from pyspider.libs.base_handler import *

class Handler(BaseHandler):
    crawl_config = {
    }

    @every(minutes=24 * 60)
    def on_start(self):
        self.crawl('http://jobs.51job.com/', callback=self.index_page, validate_cert=False, age=0)

    @config(age=10 * 24 * 60 * 60)
    def index_page(self, response):
        for each in response.doc('.e5 .lkst a').items():
            self.crawl(each.attr.href, callback=self.detail_page, validate_cert=False, age=0)

    @config(priority=2)
    def detail_page(self, response):
        for each in response.doc('.e .info .title a').items():
            self.crawl(each.attr.href, callback=self.detail_page_next, validate_cert=False, age=0,retries=3)
        for each in response.doc('.bk a').items():
            print "deep"
        self.crawl(each.attr.href, callback=self.detail_page, validate_cert=False, age=0)
                
    
    @config(priority=1)
    def detail_page_next(self, response):
        return {
            "公司":response.doc('.cname').text(),
            "公司规模":response.doc('.ltype').text(),
            "职位":response.doc('h1').text(),
            "薪资":response.doc('.cn strong').text(),
            "描述":response.doc('.job_msg').text(),
            "地点":response.doc('.lname').text(),
        }

代码页验证正确：
图片描述

控制台：
图片描述

results：
图片描述

pyspider python爬虫

python

阅读 4.5k

1 个回答

得票最新

Yujiaao

12.7k62146

发布于
2018-01-22

✓ 已被采纳

试试下面的脚本，设置detail_page为priority=2会让结果更早出现

#!/usr/bin/env python
# -*- encoding: utf-8 -*-
# Created on 2018-01-22 12:13:12
# Project: 51job


from pyspider.libs.base_handler import *

class Handler(BaseHandler):
    crawl_config = {
    }

    @every(minutes=24 * 60)
    def on_start(self):
        self.crawl('http://jobs.51job.com/', callback=self.main_index, validate_cert=False, age=0)

    @config(age=10 * 24 * 60 * 60)
    def main_index(self, response):
        for each in response.doc('.e5 .lkst a').items():
            self.crawl(each.attr.href, callback=self.index_page, validate_cert=False, age=0)

    @config(priority=1)
    def index_page(self, response):
        for each in response.doc('.e .info .title a').items():
            self.crawl(each.attr.href, callback=self.detail_page, validate_cert=False, age=0,retries=3)
        for each in response.doc('.bk a').items():
            print "deep"
        self.crawl(each.attr.href, callback=self.index_page, validate_cert=False, age=0)
                
    
    @config(priority=2)
    def detail_page(self, response):
        return {
            "公司":response.doc('.cname').text(),
            "公司规模":response.doc('.ltype').text(),
            "职位":response.doc('h1').text(),
            "薪资":response.doc('.cn strong').text(),
            "描述":response.doc('.job_msg').text(),
            "地点":response.doc('.lname').text(),
        }

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

相似问题

找不到问题？创建新问题

为什么我使用pyspider框架进行爬虫，但是results里没有结果？

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Spark-TTS-0.5B 的 requirements.txt 在哪里？