python爬虫网页抓取问题求助

https://sph.uth.edu/retnet/di...
代码如下:

import urllib2


url = 'https://sph.uth.edu/retnet/disease.htm'
print url

request = urllib2.Request(url)
response = urllib2.urlopen(request)
print response.read()

报错如下:

D:\python.exe C:/Users/annab/PycharmProjects/untitled/bmm/pc
https://sph.uth.edu/retnet/disease.htm
Traceback (most recent call last):
  File "C:/Users/annab/PycharmProjects/untitled/bmm/pc", line 8, in <module>
    response = urllib2.urlopen(request)
  File "D:\lib\urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "D:\lib\urllib2.py", line 391, in open
    response = self._open(req, data)
  File "D:\lib\urllib2.py", line 409, in _open
    '_open', req)
  File "D:\lib\urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "D:\lib\urllib2.py", line 1181, in https_open
    return self.do_open(httplib.HTTPSConnection, req)
  File "D:\lib\urllib2.py", line 1148, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 8] _ssl.c:499: EOF occurred in violation of protocol>

Process finished with exit code 1

求大神指教

阅读 6.3k
3 个回答

https ,用 requests 模块比较方便


python3

>>> url = 'https://sph.uth.edu/retnet/disease.htm'
>>> import requests as req
>>> rsp=req.get(url)
>>> rsp
<Response [200]>
>>> print(rsp.text[:200])
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"

"http://www.w3.org/TR/html4/loose.dtd">

<HTML>

<!-- Top of introduction -->

<HEAD>

  <TITLE>RetNet: Disease Table</TITLE>

  <meta h
>>> 
import requests
import sys
s = requests.session()
s.keep_alive = False
print sys.modules['requests']
url = 'https://sph.uth.edu/retnet/disease.htm'
rsp=requests.get(url)
print (rsp.text[:200])

还是会报错。。。是不是我电脑问题啊

C:\Python27\python.exe C:/Users/annab/PycharmProjects/untitled/bmm/pc
<module 'requests' from 'C:\Users\annab\PycharmProjects\untitled\requests\__init__.pyc'>
C:\Users\annab\PycharmProjects\untitled\requests\packages\urllib3\util\ssl_.py:318: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/security.html#snimissingwarning.
  SNIMissingWarning
C:\Users\annab\PycharmProjects\untitled\requests\packages\urllib3\util\ssl_.py:122: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
Traceback (most recent call last):
  File "C:/Users/annab/PycharmProjects/untitled/bmm/pc", line 7, in <module>
    rsp=requests.get(url)
  File "C:\Users\annab\PycharmProjects\untitled\requests\api.py", line 70, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Users\annab\PycharmProjects\untitled\requests\api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\annab\PycharmProjects\untitled\requests\sessions.py", line 475, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\annab\PycharmProjects\untitled\requests\sessions.py", line 596, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\annab\PycharmProjects\untitled\requests\adapters.py", line 497, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: [Errno 8] _ssl.c:499: EOF occurred in violation of protocol

Process finished with exit code 1
sudo pip install -U requests[security]

这样试一下呢~

其实我之前遇到这个问题发现是版本不对,导致无法更新,最后我用Anaconda装了一个python2.7.3搞定了,如果上面的无法解决你再试一试这个方法吧~

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进