之前在Ubuntu下写的爬虫代码运行没有问题,然后最近换了win10,程序运行抛出异常报错,
urllib2.URLError: <urlopen error no host given>
我先以为会不会是网站问题,于是我打开命令行,测试链接百度,可竟然得到同样结果,且无论是在python2.7下用urllib2还是在python3下用urllib或者request都是这个问题
测试代码就一句简单的:
page=urllib2.urlopen("http://www.baidu.com")
然后下面是跟踪的异常
File "E:/myProject/DeepLearing/Machine_Learning/test.py", line 85, in <module>
page = urllib2.urlopen("http://www.baidu.com")
File "E:/Anaconda3/envs/envpy2/lib/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "E:/Anaconda3/envs/envpy2/lib/urllib2.py", line 431, in open
response = self._open(req, data)
File "E:/Anaconda3/envs/envpy2/lib/urllib2.py", line 449, in _open
'_open', req)
File "E:/Anaconda3/envs/envpy2/lib/urllib2.py", line 409, in _call_chain
result = func(*args)
File "E:/Anaconda3/envs/envpy2/lib/urllib2.py", line 1227, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "E:/Anaconda3/envs/envpy2/lib/urllib2.py", line 1163, in do_open
raise URLError('no host given')
urllib2.URLError: <urlopen error no host given>
按照跟踪我也一个一个的文件去设置断点print,发现在self.do_open(httplib.HTTPConnection, req)里传入的req是None,试着重装anaconda,也尝试用python3.6中的新方法,都是同样结果,不知道是什么原因,请大神帮忙解决
问题解决,跟踪代码发现代码在添加opener_handles时,加入了ProxyHandlers,而ProxyHandlers中获取open出现bug,导致上文中result = func(*args)args传入为None。那么是怎么会加入ProxyHandlers的呢,因为我并没有设置代理,继续跟踪代码到urllib模块,错误出现在getproxies_registry()方法,在代码:
附近,proxyEnable输出为unicode编码的0,导致其下一句if proxyEnable 判断为真,因此我在这里将proxyEnable强制转换为int类型,问题解决,不知道这个问题是我系统的bug还是python编译器的bug