本人在写爬虫时,需要先发送一个提交表单的请求,然后获得这个表单返回的jessionid的这一个cookie, 在通过这个cookie发送请求来获得数据。但是这个cookie的有效期是在一个session内,如何将这两个请求保持在同一个session里,求大神解释。
代码如下:` def start_requests(self):
form_data = {
'sourceid': 'sdal',
'apptype': '',
'appcode': '',
'developer': '',
'channel': '',
'usertype': '',
'userid': '',
'actionCode': '20',
'achange': '',
'airname': '山航',
'airways': 'SC',
'discount': '',
'fare': '',
'fnumber': '',
'ftime': '',
'orderIndex': '',
'planetype': '',
'ttime': '',
'vendorid': '',
'seatClass': '',
'afromN': '济南',
'atoN': '厦门',
'afrom': 'TNA',
'ato': 'XMN',
'fdate_submit': '20170714',
'tdate_submit': '',
}
s = urllib.urlencode(form_data)
print s
yield scrapy.Request('http://m.shandongair.com/mainSrv', body=s, headers=self.headers,
method='POST', meta=self.meta, callback=self.parse)
def parse(self, response):
cookie = response.headers.getlist('Set-Cookie')
print 'Cookie', cookie
url = 'http://m.shandongair.com/jsp/flight/flightresult.jsp'
jess_cookie = self.parse_cookies(cookie[0])
print jess_cookie
return scrapy.Request(url, headers=self.headers, dont_filter=True,
meta={'cookiejar': response.meta['cookiejar']},
method='GET', meta=self.meta, callback=self.parse_res)`
返回的异常如下:
2017-07-14 10:26:27 [scrapy.downloadermiddlewares.cookies] DEBUG: Received cookies from: <200 http://m.shandongair.com/mainSrv>
Set-Cookie: JSESSIONID=0FFD13B49DAB8E6BD267873193A326B7; Path=/
2017-07-14 10:26:27 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://m.shandongair.com/mainSrv> (referer: None)
Cookie ['JSESSIONID=0FFD13B49DAB8E6BD267873193A326B7; Path=/']
{'JSESSIONID': '0FFD13B49DAB8E6BD267873193A326B7'}
2017-07-14 10:26:27 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <GET http://m.shandongair.com/jsp/flight/flightresult.jsp>
Cookie: JSESSIONID=0FFD13B49DAB8E6BD267873193A326B7
2017-07-14 10:26:35 [scrapy.downloadermiddlewares.cookies] DEBUG: Received cookies from: <500 http://m.shandongair.com/jsp/flight/flightresult.jsp>
Set-Cookie: JSESSIONID=A685C5C215EA85A2A665365C5A063175; Path=/
2017-07-14 10:26:35 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://m.shandongair.com/jsp/flight/flightresult.jsp> (failed 1 times): 500 Internal Server Error
2017-07-14 10:26:35 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <GET http://m.shandongair.com/jsp/flight/flightresult.jsp>
Cookie: JSESSIONID=A685C5C215EA85A2A665365C5A063175
2017-07-14 10:26:42 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://m.shandongair.com/jsp/flight/flightresult.jsp> (failed 2 times): 500 Internal Server Error
2017-07-14 10:26:42 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <GET http://m.shandongair.com/jsp/flight/flightresult.jsp>
Cookie: JSESSIONID=A685C5C215EA85A2A665365C5A063175
2017-07-14 10:26:48 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://m.shandongair.com/jsp/flight/flightresult.jsp> (failed 3 times): 500 Internal Server Error
2017-07-14 10:26:48 [scrapy.core.engine] DEBUG: Crawled (500) <GET http://m.shandongair.com/jsp/flight/flightresult.jsp> (referer: http://m.shandongair.com/mainSrv) ['partial']
2017-07-14 10:26:48 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <500 http://m.shandongair.com/jsp/flight/flightresult.jsp>: HTTP status code is not handled or not allowed
2017-07-14 10:26:48 [scrapy.core.engine] INFO: Closing spider (finished)
2017-07-14 10:26:48 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
``
用cookiejar
cookiejar文档