python多线程get请求报错urllib3.connectionpool Failed to parse headers

新手上路,请多包涵

程序需求简述

使用多线程批量向指定的一些url发送get请求(这些url都不重复)

问题描述

在requests请求中已经设置了timeout为3秒,程序运行后先是正常输出,然后在一段时间内没输出请求结果,观察发现进程中有大量线程未关闭,程序运行一段时间后出现标题所述的错误(详细错误信息已在下方贴出),查阅了相关案例可能是keep_alive的问题,于是设置了 req.keep_alive = False 但是无果

各位前辈帮忙看下是啥原因造成的,万分感谢,问题可能描述得不清楚,请见谅

程序代码贴在最后了

详细错误信息如下

URL内容使用XXX替换了,报错时内容中的URL是能正常访问的
2020-01-18 02:33:55,363 urllib3.connectionpool [WARNING] - Failed to parse headers (url=https://XXX/XXX.conf): [MissingHeaderBodySeparatorDefect()], unparsed data: "Mí\x99\x81M\x8fMIû+Pµ!ó:aç\x96\x90QÔIhvNOÄÍùS\x16.\x03UiqØÉó\x0c\x9b®'Oj\x15þ\x06\x1b\x93\x18\x8dçøÈþjw\x89è\\\x0bõ\x7f\x10Q*¢\xa0\x06ÿm/\x02^(aÐ\x12\x9b˯ÈkfÙSÉ\x81\x9a8§\xa0\\\x9938g\x88Âdñ=ÊaÑuv®\x8e^õ2\x9a»»\x1cÎê¾ásóÆðAÅ:÷ú¯·2®\x1fyä{¼ãÀ¢¦,ÃR7L\x9ff!`\x15\x81<©*»{ï(+.ÐW½Ñ»ß\x8dÅ.\x1c¨·¢\x91àr´cÙÆ=-ÄÜ¡;HttpOnly;Path=/;Secure\r\nSet-Cookie: NSC_AAAC=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_EPAC=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_USER=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_TEMP=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_PERS=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_BASEURL=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: CsrfToken=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: CtxsAuthId=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: ASP.NET_SessionId=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_TMAA=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT\r\nSet-Cookie: NSC_TMAS=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_TEMP=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT\r\nSet-Cookie: NSC_PERS=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT\r\nSet-Cookie: NSC_AAAC=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT\r\nConnection: close\r\nContent-Length: 551\r\nCache-control: no-cache, no-store, must-revalidate\r\nPragma: no-cache\r\nContent-Type: text/html\r\n\r\n"
Traceback (most recent call last):
  File "D:\soft\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 441, in _make_request
    assert_header_parsing(httplib_response.msg)
  File "D:\soft\Python\Python37\lib\site-packages\urllib3\util\response.py", line 71, in assert_header_parsing
    raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)
urllib3.exceptions.HeaderParsingError: [MissingHeaderBodySeparatorDefect()], unparsed data: "Mí\x99\x81M\x8fMIû+Pµ!ó:aç\x96\x90QÔIhvNOÄÍùS

程序代码


def checking(url):
    # 业务逻辑
    try:
        url_new = '%s/xxx.html' % url
        header = {
            'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
        }
        req = requests.session()
        req.keep_alive = False  # 尝试关闭urllib中的keep-alive
        res = req.get(url_new, headers=header,timeout=3, verify=False, allow_redirects=False)

        if 'target_text' in str(res.content):
            logger.info('[+] task %s is SUCC' % (url))
        else:
            logger.info('[-] task %s is FAIL' % (url))
    except:
        pass


def get_url_list(filename):
    url_list = []
    with open(filename, 'r', encoding='utf-8') as file:
        while True:
            url = file.readline().strip()
            if not url:
                break
            else:
                if url != '': url_list.append(url)
                print('\r已读取 %s' % len(url_list), end='', flush=True)
    print('')
    return url_list

if __name__ == '__main__':

    # 读取url
    url_list = get_url_list('data/host.txt')
    thread_list = []

    for url in  url_list: 
        thread = Thread(target=checking, args=(url,)).start()
        thread_list.append(thread)
        time.sleep(0.05)

    for th in thread_list:
        th.join()

    print('全部线程结束')
阅读 6.2k
1 个回答

用这种写法试试

with requests.session() as req:
    pass
撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题