scrapy爬取网页图片时,在pipelines文件中自定义了继承ImagesPipelines的类,
但是运行程序后无法执行自定义的pipelines。item无法传递
以下是自定义的pipelines
class Douyu3Pipeline(ImagesPipeline):
# def process_item(self, item, spider):
# return item
IMAGES_STORE = get_project_settings().get("IMAGE_STORE")
def get_media_requests(self, item, info):
print("1---------------------------1")
image_url = item["imagelink"]
yield scrapy.Request(image_url)
def item_completed(self, results, item, info):
image_path = [x["path"] for ok, x in results if ok]
os.rename(self.IMAGES_STORE + "/" + image_path[0],
self.IMAGES_STORE + "/" + item["nickname"] + ".jpg")
item['imagePath'] = self.IMAGES_STORE + '/' + item['nickname']
return item
settings 文件设置如下:
BOT_NAME = 'douyu3'
SPIDER_MODULES = ['douyu3.spiders']
NEWSPIDER_MODULE = 'douyu3.spiders'
ROBOTSTXT_OBEY = True
# Override the default request headers:
DEFAULT_REQUEST_HEADERS = {
'User-Agent': 'DYZB/1 CFNetwork/808.2.16 Darwin/16.3.0',
}
ITEM_PIPELINES = {
'douyu3.pipelines.Douyu3Pipeline': 300,
}
IMAGE_STORE = "/Users/enritami/desktop/SearchEngineer/douyu3/Images"
打印异常如下:
异常是:Enable item pipelines:[] 自定义的pipelines无法添加进pipe列表
已经解决了,如果要继承ImagesPipeline ,settings文件中的IMAGE_STORE 必须写成IMAGES_STORE