为什么我的 scrapy不会进行下载？

Question

为什么我的 scrapy不会进行下载？

luodapao

1259

发布于
2019-05-24

更新于
2019-05-24

1.在使用scrapy的时候 parse 已经获取到了 item 并且yield过去了终端已经显示获取到了但是为什么没有下载

xH.py

# -*- coding: utf-8 -*-
import scrapy
from showXh.items import ShowxhItem

class XhSpider(scrapy.Spider):
    name = 'xH'
    allowed_domains = ['xiaohua.com']
    start_urls = ['https://www.xiaohua.com/pic/']


    def parse(self, response):
        # pass
        # 
        z='~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'

        html_doc=response
        items=[]
        list_img=html_doc.css('.one-cont')
        for each in list_img:
            item = ShowxhItem()
            
            item['img_src']=each.css('.lazyload::attr("data-src")').extract()
            # print(item['img_src'])
            item['img_info']=each.css('.fonts>a').extract()
            items.append(item)
            yield item
        # return items

item.py


import scrapy


class ShowxhItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    img_src=scrapy.Field()
    img_info=scrapy.Field()

piplines.py

from scrapy.pipelines.images import ImagesPipeline

class ShowxhPipeline(object):
    def process_item(self, item, spider):
        return item

settings.py

# -*- coding: utf-8 -*-
from scrapy.pipelines.images import ImagesPipeline

BOT_NAME = 'showXh'

SPIDER_MODULES = ['showXh.spiders']
# NEWSPIDER_MODULE = 'showXh.spiders'
ITEM_PIPELINES = {
   'showXh.pipelines.ShowxhPipeline':1,
}

IMAGE_URS_FIELD='img_src'
IMAGE_STORE=r'.'

# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'showXh (+http://www.yourdomain.com)'

# Obey robots.txt rules
ROBOTSTXT_OBEY = True

终端显示

PS C:\Users\Administrator\Desktop\demo\scrapys\showXh> scrapy crawl xH
2019-05-24 15:06:31 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: showXh)
2019-05-24 15:06:31 [scrapy.utils.log] INFO: Versions: lxml 4.2.1.0, libxml2 2.9.8, cssselect 1.0.1, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.5 |Anaconda custom (64-bit)| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)], pyOpenSSL 17.0.0 (OpenSSL 1.1.1b  26 Feb 2019), cryptography 2.6.1, Platform Windows-10-10.0.17763-SP0
2019-05-24 15:06:31 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'showXh', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['showXh.spiders']}
2019-05-24 15:06:31 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
2019-05-24 15:06:31 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2019-05-24 15:06:31 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2019-05-24 15:06:31 [scrapy.middleware] INFO: Enabled item pipelines:
['showXh.pipelines.ShowxhPipeline']
2019-05-24 15:06:31 [scrapy.core.engine] INFO: Spider opened
2019-05-24 15:06:31 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-05-24 15:06:31 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2019-05-24 15:06:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.xiaohua.com/robots.txt> (referer: None)
2019-05-24 15:06:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.xiaohua.com/pic/> (referer: None)
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/103312">求中间大爷的心里想法</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/0/103/103312_20180514200545873_0.jpg']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/123771">单身要从娃娃抓起</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/201905166369362458445779598997650.jpg']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/102914">5度角行车，这是怎么办到的？！</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/0/102/102914_20180517123615391_0.gif']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/124168">学委使用屏保的正确方式</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/201905226369414313739734101453252.jpg']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/6109">这辈子的幸福毁在了人类的手上</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/0/6/6109_20180529032147796_0.jpg']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/1852">求你戳瞎我的眼!</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/0/1/1852_20180529173223076_0.jpg']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/48441">夏日解暑的必备良药</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/0/48/48441_20180516145153468_0.jpg']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/102947">亮点在哪里</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/0/102/102947_20180517123624455_0.jpg']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/123794">这谁顶得住</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/201905166369362482604275744160611.jpg']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/124172">这个词有什么问题吗老师？</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/201905226369414318239244519201005.jpg']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/123773">工程鼓励员</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/201905166369362460765702797508736.jpg']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/9642">全班唯一一个女生</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/0/9/9642_20180526060354940_0.jpg']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/103692">意义何在？</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/0/103/103692_20180514221335764_0.jpg']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/17514">好贴心的楼，竟然为老王开设专用通道</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/0/17/17514_20180525110630046_0.jpg']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/103611">三个妹子来面试，老板却把中间那个录取了？</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/0/103/103611_20180514221218219_0.jpg']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/61975">凳子是无辜的</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/0/61/61975_20180515184246748_0.gif']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/123981">我怀疑你在开车，证据确凿</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/201905206369396987197160239284618.jpg']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/62958">每次见到自己喜欢的明星上场时的你</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/0/62/62958_20180515171625299_0.gif']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/103605">看你下次还敢来女宿舍不敢了。</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/0/103/103605_20180514192638646_0.jpg']}
2019-05-24 15:06:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.xiaohua.com/pic/>
{'img_info': ['<a href="/detail/122684">只要饮食搭配合理，长胖是不可能的</a>'],
 'img_src': ['https://img.xiaohua.com/Picture/201904286369207041530885163360414.jpg']}
2019-05-24 15:06:32 [scrapy.core.engine] INFO: Closing spider (finished)
2019-05-24 15:06:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 442,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 12605,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 2,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2019, 5, 24, 7, 6, 32, 349909),
 'item_scraped_count': 20,
 'log_count/DEBUG': 23,
 'log_count/INFO': 7,
 'response_received_count': 2,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2019, 5, 24, 7, 6, 31, 623499)}
2019-05-24 15:06:32 [scrapy.core.engine] INFO: Spider closed (finished)

scrapy

python

阅读 2.7k

1 个回答

得票最新

lpe234

4.1k3930

发布于
2019-05-25

✓ 已被采纳

兄弟，你得好好看看文档啊。

# 你得用去起来了才行
ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}

你自定义了ShowxhPipeline，但是里面没做任何处理啊。

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

相似问题

找不到问题？创建新问题

为什么我的 scrapy不会进行下载？

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

如何使用 python 代码实现迅雷磁力链接资源的下载？

如何实现一个深拷贝函数？

请问，FastAPI如何获取到前端上传的二进制文件并且返回？

浏览器能请求到数据怎么换了api工具或是爬虫都没数据了呢？

Python 成员变量在多个子类实例间共享，如何避免？