Python:重写了Scrapy FilesPipeline中的item_completed和file_path,但没有被调用

piplines.py:

class ApkspiderPipeline(FilesPipeline):
    def get_media_requests(self, item, info):
        for file_url in item["file_urls"]:
            yield Request(file_url)
    def item_completed(self, results, item, info):
        file_paths = [x["path"] for ok, x in results if ok]
        print file_paths
        if not file_paths:
            raise DropItem("Item contains no images")        
        return item
    def file_path(self,request,response=None,info=None):
        media_guid = item['AppName']+'.apk'
        filename = u'full/{0}/{1}'.format(Request.file_url.replace('http://sj.qq.com/myapp/category.htm?orgame=1&categoryId=',''), media_guid) 
        return filename

setting.py:

ITEM_PIPELINES = {
    #'apkSpider.pipelines.CheckPipeline': 300,
    #'apkSpider.pipelines.JsonWriterPipeline': 300,
    #'apkSpider.pipelines.ApkspiderPipeline': 300,
    'scrapy.pipelines.files.FilesPipeline': 1,
}
FILES_STORE = 'output'
FILES_EXPIRES = 90

图片描述

爬去下来的文件名依然是sha1编码过的

图片描述

阅读 7.4k
1 个回答

settings错了,按你现在的设定还是在用默认pipeline在处理item,你要设定成你重写的类名,类似

'yourprojectname.pipelines.ApkspiderPipeline': 1
撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题