python scrapy 运行时pipline中报KeyError的错误,求救啊~

新手上路,请多包涵

小弟刚学scrapy,知之甚少,最近尝试着爬了苏宁,结果惨不忍睹,各种错误不断啊,所幸各位前辈踩的坑不少,诸多问题百度都解决了,但是有一个pipline文件KeyError的错误折腾了几晚都没解决,只能求助各位大佬了,下面贴代码
settings:

ITEM_PIPELINES = {
    'weibo.pipelines.SuningPipeline': 200,
}

items:

class SuningItem(scrapy.Item):
    name = scrapy.Field()           #存储商品名称
    price = scrapy.Field()          #存储商品价格
    evaluate = scrapy.Field()       #存储商品好评数
    favorable_rate = scrapy.Field() #存储商品好评率/数
    brand = scrapy.Field()          #存储商品品牌
    style = scrapy.Field()          #存储商品款式
    product_url = scrapy.Field()    #存储商品链接
    storename = scrapy.Field()      #存储店铺名称

这个item中还有另外一个不同名的类,不影响吧~

spider:

import scrapy
from weibo.items import SuningItem
from scrapy.http import Request
import re

class SuninSpider(scrapy.Spider):
    name = 'sunin'
    allowed_domains = ['suning.com']
    start_urls = ['https://search.suning.com/%E5%A5%B3%E9%9E%8B/']
    
    url = 'https://search.suning.com/%E5%A5%B3%E9%9E%8B/'

request对象
    def start_requests(self):
        for i in range(0, 99):
            url = self.url + '&cp=' + str(i)
            yield Request(url, callback = self.parse)
            
    def parse(self, response):
        suningitem = SuningItem()
        suningitem['product_url'] =response.xpath('//p[@class="sell-point"]/a/@href').extract()
        for i in range(0, len(suningitem['product_url'])):
            suningitem['product_url'][i] = "http:" + str(suningitem['product_url'][i])
        product_url = suningitem['product_url']

        for i in product_url:
            if 'http:' in i: 
                yield Request(i, meta={'key':suningitem}, callback= self.product_parse)
                #将所有抓到的商品url生成request对象返回,并且用meta传递item
            else:
                i = 'http:' + i
                yield Request(i, meta={'key':suningitem}, callback= self.product_parse)
        yield suningitem

    def product_parse(self, response):
        suningitem = response.meta['key']
        product_url = response.url
        #print("---------" + str(product_url) +  "------")
        suningitem['brand'] = response.xpath('//td[@class="val"]/a/text()').extract()
        suningitem['style'] = response.xpath('//tr/td/div/span[text()="款式"]/../../../td[@class="val"]/text()').extract()
        suningitem['name'] = response.xpath('//h1/text()').extract()
        suningitem['storename'] = response.xpath('//div[@class="si-intro-list"]/dl[1]/dd/a/text()').extract()

        proID_pat = re.compile('/(\d+)\.html')
        storeID_pat = re.compile('/(\d.+)/')
        proID = proID_pat.findall(product_url)[0]
        storeID = storeID_pat.findall(product_url)[0]
        favorate_url = 'https://review.suning.com/ajax/review_satisfy/style-000000000' + str(proID[0]) + '-' + str(storeID[0]) + '-----satisfy.htm?'
        yield Request(favorate_url, meta = {'key': suningitem}, callback = self.favorate_parse)
        price_url = 'https://pas.suning.com/nspcsale_1_000000000' + str(proID) + '_000000000' + str(proID) + '_' + str(storeID) + '_60_319_3190101_361003_1000103_9103_10766_Z001___R9001225.html'
        yield Request(price_url, meta={'key': suningitem}, callback=self.price_parse)

        yield suningitem

    def favorate_parse(self, response):
        suningitem = response.meta['key']
        #suningitem['favorable'] = response.xpath('').extract()
        #suningitem['evaluate'] = response.xpath('').extract()
        fa_pat = re.compile('"totalCount":(\d+),')
        eva_pat = re.compile('"fiveStarCount":(\d+),')
        
        suningitem['evaluate'] = eva_pat.findall(response)[0]
        suningitem['favorable'] = fa_pat.findall(response)[0]
        yield suningitem
        
    def price_parse(self, response):
        suningitem = response.meta['key']
        price_pat = re.compile('"netPrice":"(\d*?\.\d*?)"')
        suningitem['price'] = price_pat.findall(response.body.decode('utf-8', 'ignore'))[0]

        yield suningitem

piplines:

class SuningPipeline(object):
    def __init__(self):
        self.lfile = open(r'd:\desktop\苏宁data.csv', 'w', encoding='utf-8')
    
    def process_item(self, item, spider):
        ''' name = item['name']
        price = item['price']
        evaluate = item['evaluate']
        favorable = item['favorable']
        brand = item['brand']
        style = item['style']
        product_url = item['product_url']
        storename = item['storename']
            
        for i in range(0, len(name)-1):
            line = name[i] + ',' + price[i] + ',' + evaluate[i] + ',' + favorable[i] + ',' + brand[i] + ',' + style[i] + ','\
            + product_url[i] + ',' + storename[i] + '\r\n'
            self.lfile.write(line) '''

        #上面注释掉的块是刚开始尝试写入本地csv文件的方法,之前用过这种方法成功了,不知道这次为什么会报KeyError,下面的方法是不知道什么地方学来的了,也报KeyError……
        
        self.lfile.write(
            str(item['name']) + ',' +
            str(item['price']) + ',' + 
            str(item['evaluate']) + ',' + 
            str(item['favorable']) + ',' +
            str(item['style']) + ',' +
            str(item['product_url']) + ',' +
            str(item['storename']) + '\r\n'
        )

        return item
    
    def close_spider(self, spider):
        self.lfile.close()

大家有时间帮忙看看给小弟指条明路啊,如果还存在其它问题或者更好的方法也望大家帮忙指出啊,小弟感激不尽……拜谢先……

阅读 7.3k
3 个回答
新手上路,请多包涵

你把product_parse,price_parse,favorate_parse函数下的最后的那个yield suningitem 去掉试试

首先 提问题 最好把报错也给上
然后 你说的keyerror 然后我检查了一下

item 的定义
favorable_rate = scrapy.Field()
实际应用
suningitem['favorable'] = fa_pat.findall(response)[0]

在item的定义里边并没有找到favorable
然后keyerror 这不是正常?

favorable_rate item['favorable'] 没这个key

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题