scrapy写爬虫 却返回不出东西

我的想法是 输入一个电影名 然后返回它的信息

# -*- coding: utf-8 -*-
import sys
sys.path.append("..")
reload(sys)
sys.setdefaultencoding('utf8')
from scrapy.spider import Spider
from scrapy.http import Request
from scrapy.selector import Selector
from scrapy.spiders import Rule,CrawlSpider
from items import doubanSpiderItem
from scrapy.contrib.linkextractors import LinkExtractor

class doubanSpider(CrawlSpider):

    name = 'doubanSpider'
    allowed_domains=[]
    start_urls = ['http://movie.douban.com/subject_search?search_text=%E7%A7%BB%E5%8A%A8%E8%BF%B7%E5%AE%AB']

    def start_requests(self):
        movie_name = raw_input("输入电影名:")
        try:
            url_head = "http://movie.douban.com/subject_search?search_text="
            self.start_urls.append(url_head+str(movie_name))
            for url in self.start_urls:
                yield self.make_requests_from_url(url)
        except:
            print "can not connect"
            # 获取搜索电影界面

    def parse(self, response):

        sel=Selector(response)
        print sel

        movie_link = sel.xpath("//div[@class='pl2']/a/@href/text()").extract()
        print movie_link
        if movie_link:
             yield Request(movie_link[0],callback=self.parse_item)
        #进入所搜索电影界面
    def parse_item(self,response):
        sel = Selector(response)
        movie_name = sel.xpath("//span[@property = 'v:itemreviewed']/text()").extract()
        print movie_name
        

这是我的代码 下面是terminal 的反应

timmys-MacBook-Pro:spiders apple$ scrapy crawl doubanSpider
/Users/apple/Desktop/doubanSpider/doubanSpider/spiders/doubanSpider.py:6: ScrapyDeprecationWarning: Module `scrapy.spider` is deprecated, use `scrapy.spiders` instead
  from scrapy.spider import Spider
/Users/apple/Desktop/doubanSpider/doubanSpider/spiders/doubanSpider.py:11: ScrapyDeprecationWarning: Module `scrapy.contrib.linkextractors` is deprecated, use `scrapy.linkextractors` instead
  from scrapy.contrib.linkextractors import LinkExtractor
2015-11-08 20:50:51 [scrapy] INFO: Scrapy 1.0.3 started (bot: doubanSpider)
2015-11-08 20:50:51 [scrapy] INFO: Optional features available: ssl, http11
2015-11-08 20:50:51 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'doubanSpider.spiders', 'SPIDER_MODULES': ['doubanSpider.spiders'], 'BOT_NAME': 'doubanSpider'}
2015-11-08 20:50:51 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
2015-11-08 20:50:51 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-11-08 20:50:51 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-11-08 20:50:51 [scrapy] INFO: Enabled item pipelines: doubanSpiderPipeline
2015-11-08 20:50:51 [scrapy] INFO: Spider opened
2015-11-08 20:50:51 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2015-11-08 20:50:51 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
输入电影名:移动迷宫
2015-11-08 20:50:58 [scrapy] DEBUG: Crawled (200) <GET http://movie.douban.com/subject_search?search_text=%E7%A7%BB%E5%8A%A8%E8%BF%B7%E5%AE%AB> (referer: None)
2015-11-08 20:50:58 [scrapy] DEBUG: Crawled (200) <GET http://movie.douban.com/subject_search?search_text=%E7%A7%BB%E5%8A%A8%E8%BF%B7%E5%AE%AB> (referer: None)
<Selector xpath=None data=u'<html lang="zh-CN" class="">\n<head>\n    '>
[]
<Selector xpath=None data=u'<html lang="zh-CN" class="">\n<head>\n    '>
[]
2015-11-08 20:50:58 [scrapy] INFO: Closing spider (finished)
2015-11-08 20:50:58 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 554,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 16250,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 2,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2015, 11, 8, 12, 50, 58, 566941),
 'log_count/DEBUG': 3,
 'log_count/INFO': 7,
 'response_received_count': 2,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/memory': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/memory': 2,
 'start_time': datetime.datetime(2015, 11, 8, 12, 50, 51, 888328)}
2015-11-08 20:50:58 [scrapy] INFO: Spider closed (finished)
timmys-MacBook-Pro:spiders apple$  

然后是豆瓣html
图片描述

阅读 10k
1 个回答
新手上路,请多包涵

你的xpath改改,哪有这样写的?@href/text()

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题