求高手,使用urllib2请求网页内容时请求成功却返回空的值?

keeponlight
  • 18

这是出现类似问题的一个网页http://detail.zol.com.cn/inde...

测试代码
import urllib2
url = 'http://detail.zol.com.cn/inde...'
response = None
try:
response = urllib2.urlopen(url,timeout=5)
html = response.read()
print html
print "hehe"
except urllib2.URLError as e:
if hasattr(e, 'code'):

print 'Error code:',e.code

elif hasattr(e, 'reason'):

print 'Reason:',e.reason

finally:
if response:

response.close()

运行结果:C:Python27python.exe C:/Users/Administrator/PycharmProjects/untitled/data02
hehe

Process finished with exit code 0

这段代码运行后也是空值
page = urllib2.Request(url)
page.add_header('Referer', url)
page.add_header('User-Agent', "Mozilla/5.0 (Windows NT 6.2; rv:16.0) Gecko/20100101 Firefox/16.0")
r = urllib2.urlopen(page,timeout=5.0)
html = r.read()
soup = BeautifulSoup(html, 'lxml')

回复
阅读 4.3k
2 个回答
✓ 已被采纳

暴力一点,cookie带上

import requests

url = 'http://detail.zol.com.cn/index.php?c=SearchList&keyword=coolpad_8297_w01'

headers = {
    'Cookie': 'userProvinceId=2; userCityId=0; userLocationId=26; proIp=123; ip_ck=4cKD5vP/j7QuNjUyMTk4LjE0Njk0Mzg5MzQ%3D; lv=1469438963; vn=1; Hm_lvt_ae5edc2bc4fc71370807f6187f0a2dd0=1469438964; Hm_lpvt_ae5edc2bc4fc71370807f6187f0a2dd0=1469438964; z_day=rdetail=1; z_pro_city=s_provice%3Dshanghai%26s_city%3Dxingqu',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36'
}
r = requests.get(url, headers=headers)
print r.text

把浏览器cookie清除了,在访问这个页面,发现也是空的,分析了下,它的cookie加密了,用js设置的.如果你js好,可以尝试分析下,实在不行就用selenium操作chrome来弄吧.

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
宣传栏