使用scrapy+redis从一定量的淘宝详情页url获取商品详情
已设置user-agent,已传入cookie,已设置proxy-ip
获取url,response.status有时是200,有时是302,随机改变
1000个url,成功获取商品信息大概有400多
是否为cookie未传入成功,还是proxy-ip不稳定?或者其他原因。请帮忙分析,谢谢!
报错Traceback:
2017-07-14 15:51:12 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://item.taobao.com/item.htm?id=10245430841&ns=1&abbucket=0#detail> (referer: None)
2017-07-14 15:51:12 [requests.packages.urllib3.connectionpool] INFO: Starting new HTTPS connection (1): rate.taobao.com
2017-07-14 15:51:12 [requests.packages.urllib3.connectionpool] DEBUG: "GET /detailCommon.htm?auctionNumId=10245430841 HTTP/1.1" 200 None
2017-07-14 15:51:12 [scrapy.core.scraper] DEBUG: Scraped from <200 https://item.taobao.com/item.htm?id=10245430841&ns=1&abbucket=0>
None
2017-07-14 15:51:12 [taobao] DEBUG: Read 1 requests from 'taobao:start_urls'
2017-07-14 15:51:12 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <GET https://item.taobao.com/item.htm?id=10245681616&ns=1&abbucket=0#detail>
2017-07-14 15:51:12 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://h5.m.taobao.com/awp/core/detail.htm?id=10245681616&ns=1&abbucket=0> from <GET https://item.taobao.com/it
em.htm?id=10245681616&ns=1&abbucket=0#detail>
2017-07-14 15:51:12 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <GET http://h5.m.taobao.com/awp/core/detail.htm?id=10245681616&ns=1&abbucket=0>
2017-07-14 15:51:12 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://h5.m.taobao.com/awp/core/detail.htm?id=10245681616&ns=1&abbucket=0> (referer: None) ['partial']
2017-07-14 15:51:12 [scrapy.core.scraper] ERROR: Spider error processing <GET http://h5.m.taobao.com/awp/core/detail.htm?id=10245681616&ns=1&abbucket=0> (referer: None)
已找到异常原因,导入user-agent里面有mobile端的ua,删除之后,就没问题了
自己更新了一个2017最新的ua_list(pc端)给大家:https://github.com/lovebaicai...