scrapy的一些问题和疑惑

Question

scrapy的一些问题和疑惑

622034

发布于
2018-05-02

我的代理中间件：settings已设置为544，默认添加为None

class IpProxyMiddleware(object):
    def __init__(self, ip=''):
        self.ip = ip

    def process_request(self, request, spider):
        self.ip = requests.get('http://localhost:5555/random').text
        logging.info('当前使用代理IP为：' + self.ip)
        request.meta['proxy'] = 'http://' + self.ip

疑惑：每次使用Request指定一个url和回调函数，都会执行process_request方法吗？然后调用一次API获取本地的代理Ip？doc是这样说的

当每个request通过下载中间件时，该方法被调用

问题：如何设置回调的解析函数中，当解析非200返回码，重新切换代理IP？使用如下代码是否有问题？我使用了，还是无效。不知从何检查。

        if response.status != 200:
            logging.error('--------IP has be baned!Retry again~ --------')
            yield Request(url=response.url,meta={'change_proxy': True}, callback=self.followees_parse)

python

阅读 1.7k

1 个回答

得票最新

lyg4795

6912033

发布于
2018-05-06

debug一下看看呗，打断点一步步运行

撰写回答