我有一个包含多个蜘蛛的 scrapy 项目。有什么方法可以定义哪个管道用于哪个蜘蛛？并非我定义的所有管道都适用于每个蜘蛛。谢谢原文由 CodeMonkeyB 发布，翻译遵循 CC BY-SA 4.0 许可协议

如何在单个 Scrapy 项目中为不同的蜘蛛使用不同的管道

2 个回答

发布于
2023-01-09

✓ 已被采纳

基于 Pablo Hoffman 的解决方案，您可以在 Pipeline 对象的 process_item 方法上使用以下装饰器，以便它检查 pipeline 您的蜘蛛的属性是否应该被执行。例如：

 def check_spider_pipeline(process_item_method):

    @functools.wraps(process_item_method)
    def wrapper(self, item, spider):

        # message template for debugging
        msg = '%%s %s pipeline step' % (self.__class__.__name__,)

        # if class is in the spider's pipeline, then use the
        # process_item method normally.
        if self.__class__ in spider.pipeline:
            spider.log(msg % 'executing', level=log.DEBUG)
            return process_item_method(self, item, spider)

        # otherwise, just return the untouched item (skip this step in
        # the pipeline)
        else:
            spider.log(msg % 'skipping', level=log.DEBUG)
            return item

    return wrapper

为了让这个装饰器正常工作，蜘蛛必须有一个管道属性，其中包含一个管道对象的容器，你想用它来处理项目，例如：

 class MySpider(BaseSpider):

    pipeline = set([
        pipelines.Save,
        pipelines.Validate,
    ])

    def parse(self, response):
        # insert scrapy goodness here
        return item

然后在 pipelines.py 文件中：

 class Save(object):

    @check_spider_pipeline
    def process_item(self, item, spider):
        # do saving here
        return item

class Validate(object):

    @check_spider_pipeline
    def process_item(self, item, spider):
        # do validating here
        return item

所有 Pipeline 对象仍应在设置中的 ITEM_PIPELINES 中定义（以正确的顺序 - 最好更改以便也可以在 Spider 上指定顺序）。

原文由 mstringer 发布，翻译遵循 CC BY-SA 3.0 许可协议

社区维基

1

发布于
2023-01-09

只需从主要设置中删除所有管道并在蜘蛛内部使用它。

这将为每个蜘蛛定义用户的管道

class testSpider(InitSpider):
    name = 'test'
    custom_settings = {
        'ITEM_PIPELINES': {
            'app.MyPipeline': 400
        }
    }

原文由 Mirage 发布，翻译遵循 CC BY-SA 3.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

如何在单个 Scrapy 项目中为不同的蜘蛛使用不同的管道

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

分解质因素的算法很难，理解不了。请问有哪位大佬可以进行解释一下呢？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Stack Overflow 翻译

如何在单个 Scrapy 项目中为不同的蜘蛛使用不同的管道

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

分解质因素的算法很难，理解不了。 请问有哪位大佬可以进行解释一下呢？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Stack Overflow 翻译

分解质因素的算法很难，理解不了。请问有哪位大佬可以进行解释一下呢？