python.requests爬出来的内容与浏览器看到的不同

Question

python.requests爬出来的内容与浏览器看到的不同

发布于
2016-11-12

用python.requests爬取http://app1.sfda.gov.cn/datas...中的表格数据，但是python.requests返回的内容跟浏览器中看的不同，下面附上代码：

import requests
def testLoadRequest():
    params1 = {
        'tableId': '27',
        'tableName': 'TABLE27',
        'tableView': '%BD%F8%BF%DA%C6%F7%D0%B5',
        'Id': '24583'
    }
    headers1 = {
        'Content-Type': "text/html;encoding=gbk",
        'X-Requested-With': 'XMLHttpRequest'
    }
    url1 = 'http://app1.sfda.gov.cn/datasearch/face3/content.jsp';
    try:
        r = requests.get(url1,params=params1, headers=headers1)
        print(r.text)
        print(r.cookies)
        print(r.status_code)
        print(r.url)
    except Exception as e:
        print(e)
testLoadRequest()

下面是浏览器看到的内容：
图片描述

但是用python.requests爬到的html内容如下：

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    <meta http-equiv="Cache-Control" content="no-store, no-cache, must-revalidate, post-check=0, pre-check=0"/>
    <meta http-equiv="Connection" content="Close"/>
    <script type="text/javascript">function stringToHex(str) {
        var val = "";
        for (var i = 0; i < str.length; i++) {
            if (val == "")val = str.charCodeAt(i).toString(16); else val += str.charCodeAt(i).toString(16);
        }
        return val;
    }
    function YunSuoAutoJump() {
        var width = screen.width;
        var height = screen.height;
        var screendate = width + "," + height;
        var curlocation = window.location.href;
        if (-1 == curlocation.indexOf("security_verify_")) {
            document.cookie = "srcurl=" + stringToHex(window.location.href) + ";path=/;";
        }
        self.location = "/datasearch/face3/content.jsp?tableView=½ø¿ÚÆ÷Ðµ&Id=24583&tableName=TABLE27&tableId=27&security_verify_data=" + stringToHex(screendate);
    }</script>
    <script>setTimeout("YunSuoAutoJump()", 50);</script>
</head>
</html>

很明显爬出来的内容不是表格里的数据，而且有时还会爬不出来报
('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))
这个错误，有知道原因的人吗？？希望能给我点明一下，谢谢了

javascript

python

html5 cookies ajax

阅读 8.4k

1 个回答

得票最新

unofficial

1.5k51121

发布于
2016-11-12

✓ 已被采纳

帮测试了，请求源存在问题，url1我更换了链接可以抓取成功。