运行pyspider脚本报AttributeError

新手上路,请多包涵

我是python新手,在用pyspider做爬虫时脚本报了AttributeError,

脚本代码如下:

#!/usr/bin/python
#coding:utf-8

from pyspider.libs.base_handler import *

class Handler(BaseHandler):
    crawl_config = {}

    def on_start(self):
        self.crawl('http://scrapy.org', callback=self.index_page)

    def index_page(self, response):
        url_list = []
        for each in response.doc('a[href^="http"]').items():
            url_list.append(each.arrt.href)
        return url_list

    @config(priority=5)
    def detail_page(self, response):
        return {
            "url": response.url,
            "title": response.doc('title').text(),
        }

    def on_result(self, result):
        if not result:
            return
        assert self.task, "on_result can't outside a callback."
        result['callback'] = self.task['process']['callback']
        print(result)

if __name__=='__main__':
    handler = Handler()
    handler.on_start()

错误信息如下:

Traceback (most recent call last):
  File "./firstScript.py", line 34, in <module>
    handler.on_start()
  File "./firstScript.py", line 10, in on_start
    self.crawl('http://scrapy.org', callback=self.index_page)
  File "/usr/lib/python2.7/site-packages/pyspider/libs/base_handler.py", line 394, in crawl
    return self._crawl(url, **kwargs)
  File "/usr/lib/python2.7/site-packages/pyspider/libs/base_handler.py", line 338, in _crawl
    if cache_key not in self._follows_keys:
AttributeError: 'Handler' object has no attribute '_follows_keys'

在百度和Google上查了好久,都没有找到解决办法,希望大神们帮我看看是什么原因,谢谢!

阅读 4.3k
2 个回答

你不能

if __name__=='__main__':
    handler = Handler()
    handler.on_start()

这样运行爬虫

并不需要 使用 if语句来启动这个class类,pyspider默认是自动启动这个类,你这样写会让pyspider无法理解,从而导致多个请求阻塞进程。

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进