python 爬虫下载跳转问题?状态码总是302?用浏览器手动下载就没有任何问题?

1.我在某个网页里获得了某个下载地址A,用浏览器点击它的时候,总是会跳转到真实下载地址B,而且浏览器开始自动帮助你下载,用浏览器表示一切都是正常的,但是我用自己写的爬虫程序,总是找不到真实的下载地址B,总是显示状态码302?下载的种子文件大小还是0?
2.这是我自己的代码:
url = 'http://www.gawu88.space/forum...'
headers = {

'Cookie': '__cfduid=d15f7eb39310b0301f07e1f744ca70a3d1526800937; _ga=GA1.2.942865751.1526800940; A8tI_2132_saltkey=njU69xqb; A8tI_2132_lastvisit=1526797339; A8tI_2132_adult_warn=1; A8tI_2132_auth=7d44BRr5TCxDGN9zYzcgtvgTYZzopZtEOJjzAO323fO%2BdvFoIjRzKH31yzmid2IjzmB9bQ5PLK%2B1iWLRV%2BnD6zp8PwkV; A8tI_2132_lastcheckfeed=7589318%7C1526800977; A8tI_2132_smile=2D1; A8tI_2132_atarget=1; _gid=GA1.2.849215201.1527331040; A8tI_2132_adv_gid=18; A8tI_2132_noticeTitle=1; A8tI_2132_self_unique_code=0856eca0-298c-1e72-91a0-4b6652a8c21d; cus_cookie=3; A8tI_2132_ignore_notice=1; A8tI_2132_notification_unread_tips=1527595369; __insp_wid=1484672786; __insp_nv=true; __insp_targlpu=aHR0cDovL3d3dy5nYXd1ODguc3BhY2UvdGhyZWFkLTk0Nzg5MTgtMS0xLmh0bWw%3D; __insp_targlpt=44CQ44CA44CA44CA44CR44CQTVA0LzEuNDdH44CR6LaF5q2j6aKc5YC855qE5p6B5ZOB6Z_p6LaK5re36KGA5aup5aa556ysMuWto_e_juiFv_m7keS4neS4pOWwj_aXtuWFqOijuOiHquaRuOivseaDkeiJs_iInuengC3mnY%2FlkKdf5oCn5ZCnX3NleDhf5p2P5ZCn5pyJ5L2g5pil5pqW6Iqx5byALeadj_WQpy3mgKflkKctc2V4OC3ljY7kurrmgKfniLHkuIvovb3ljLp8Q2hpbmVzZSBzZXggQlQt5p2P5ZCn5pyJ5L2gLOaYpeaaluiKseW8gA%3D%3D; __insp_norec_sess=true; A8tI_2132_sign_close=1; A8tI_2132_credit_max_num=0; A8tI_2132_credit_remain_num=0; A8tI_2132_ulastactivity=1527595671%7C0; A8tI_2132_home_diymode=1; A8tI_2132_sendmail=1; A8tI_2132_visitedfid=798D216D227D181D815D307D791D11D180D142; A8tI_2132_self_uid=7589318; A8tI_2132_self_fid=798; A8tI_2132_st_t=7589318%7C1527595792%7C2a861e2fa44511158f2204dd33027dbf; A8tI_2132_forum_lastvisit=D_180_1526811032D_815_1527520227D_181_1527522817D_227_1527595414D_216_1527595671D_798_1527595792; A8tI_2132_viewid=tid_9478266; A8tI_2132_self_tid=9478266; A8tI_2132_st_p=7589318%7C1527595855%7Cacd971897d6231863e34e40a030708b9; _gat=1; _gat_gtag_UA_117992228_1=1; _gat_gtag_UA_115157189_1=1; A8tI_2132_seccode=131784332.cc6482a36d7edec35d; __insp_slim=1527595859229; A8tI_2132_lastact=1527595889%09forum.php%09attachment',
'Host':'www.wifi588.net:443',
'Referer':'http://www.wifi588.net',
'Accept-Encoding':'',
'Connection':'keep-alive',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36'

}
response = requests.get(url,headers=headers)
html = etree.HTML(response.text)
hrefs ='http://www.gawu88.space/'+ html.xpath('//p[@class="attnm"]/a/@href')[0]
req = requests.get(hrefs)
print(req.status_code)
file_name = "f:/1.torrent"
with open(file_name,"wb") as f:

f.write(req.content)
f.close()

3.其中req就是网页中解析出来的A地址,用浏览器访问,能够得到真实下载地址B,用自己的程序总是获取不到真实下载地址,我想请各位朋友能够帮我看看,感激不尽。requests不是能够自动帮助解决跳转问题么?为什么打印req.url显示的还是原来的地址?最后再次感谢各位朋友。

阅读 3.2k
1 个回答

首先要了解302状态码是什么意思

301 , 302 都是 HTTP 状态的编码,都代表着某个URL发生了转移,不同之处在于:

301  redirect: 301 代表永久性转移(Permanently Moved)。
302  redirect: 302 代表暂时性转移(Temporarily Moved )。

然后浏览器从响应头中得到转以后的地址。也就是响应头中的Location
所以你要先请求A,然后解析一下响应头中的信息,得到真实下载地址B

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题