Pay attention to the WeChat public account: Brother K crawler, continue to share technical dry goods such as advanced crawler, JS/Android reverse!

statement

All content in this article is for learning and exchange only. The packet capture content, sensitive URLs, and data interfaces have been desensitized. Commercial and illegal uses are strictly prohibited. Otherwise, all consequences arising therefrom have nothing to do with the author. Infringement, please contact me on the official account to delete it immediately!

reverse goal

The goal of this time is to crawl Lagou.com positions. Some of the key parameters involved are as follows:

  • Request header parameters: traceparent , X-K-HEADER , X-S-HEADER , X-SS-REQ-HEADER , x-anit-forge-code , x-anit-forge-token
  • Cookie values: user_trace_token , X_HTTP_TOKEN , __lg_stoken__
  • POST request data encryption, return encrypted job information decryption, AES algorithm

There are many parameters, but in fact, some parameters can be fixed or not directly, such as the three values of Cookie, X-K-HEADER , X-SS-REQ-HEADER , etc. in the request header can be fixed, x-anit-forge-code and x-anit-forge-token are optional. Nonetheless, this article analyzes the source of each parameter, which can be handled flexibly according to your actual situation.

In addition, even if all the parameters are filled up, Lagou.com has a frequency limit for a single IP. If you can't catch a few times, you will be required to log in. You can use the proxy to grab it yourself, or copy the cookies after the account login into the code, you can To lift the restriction, if it is accessed after the account is logged in, there are two more parameters in the request header, namely x-anit-forge-code and x-anit-forge-token . After testing, these two parameters are not required.

Packet capture analysis

Search for a position, click to turn the page, and you can see an Ajax request named positionAjax.json. It is not difficult to judge that this is the returned position information. The key parameters have been framed in the figure.

Not logged in, normal IP, normal request, Header and Cookies:

01

02

Abnormal IP, request after login account, Header and Cookies:

03

04

Both request data and return data are encrypted:

05

Cookies parameter

First look at the key parameters in cookies, mainly user_trace_token , X_HTTP_TOKEN and __lg_stoken__ .

user_trace_token

Returned through the interface, you can find it by direct search, as shown in the following figure:

06

07

The request parameter, time is the timestamp, the value of a is arbitrary, none of them can be, it does not affect, other values are fixed values, the key code obtained is as follows:

def get_user_trace_token() -> str:
    # 获取 cookie 中的 user_trace_token
    json_url = "https://a.脱敏处理.com/json"
    headers = {
        "Host": "a.脱敏处理.com",
        "Referer": "https://www.脱敏处理.com/",
        "User-Agent": UA
    }
    params = {
        "lt": "trackshow",
        "t": "ad",
        "v": 0,
        "dl": "https://www.脱敏处理.com/",
        "dr": "https://www.脱敏处理.com",
        "time": str(int(time.time() * 1000))
    }
    response = requests.get(url=json_url, headers=headers, params=params)
    user_trace_token = response.cookies.get_dict()["user_trace_token"]
    return user_trace_token

X_HTTP_TOKEN

There is no value in direct search. Go directly to Hook Dafa. If you are not sure, you can read Brother K's previous articles. There are detailed tutorials, so I won't go into details here.

(function () {
    'use strict';
    var cookieTemp = "";
    Object.defineProperty(document, 'cookie', {
        set: function (val) {
            console.log('Hook捕获到cookie设置->', val);
            if (val.indexOf('X_HTTP_TOKEN') != -1) {
                debugger;
            }
            cookieTemp = val;
            return val;
        },
        get: function () {
            return cookieTemp;
        }
    });
})();

08

Going up with stack debugging is a small OB confusion. _0x32e0d2 is the last X_HTTP_TOKEN value, as shown in the following figure:

09

Stud directly, there are only more than 300 lines, no need to deduct, copy all of them, run locally, and find that an error document is undefined, locate the code location, and debug it with a breakpoint, and find that it is a regular match with the value of user_trace_token in the cookie, then we Just define it directly: var document = {"cookie": cookie} , and pass user_trace_token as the cookie value.

10

After completing the document and running it again, it will report an error window is not defined, and locate the source code again, as shown in the following figure:

11

Analyze, take the window XMLHttpRequest object, send an Ajax GET request to the wafcheck.json interface, and then take the Date value of the Response Header and assign it to _0x309ac8 . Note that the Date value is 8 hours later than the normal time, but the Date value is taken. The value is of no use, because a new Date standard time is added later, and it is assigned to _0x150c4d . Although the new Date(_0x309ac8[_0x3551('0x2d')](/-/g, '/')) statement uses the previous old Date, it is actually a replacement method of replace() , which has nothing to do with the old Date, and then Call the Date.parse() method to convert the new Date into a timestamp and assign it to _0x4e6d5d , so there is no need to be so complicated, just modify the _0x89ea429 method locally:

// 原方法
// function _0x89ea42() {
//     var _0x372cc0 = null;
//     if (window[_0x3551('0x26')]) {
//         _0x372cc0 = new window[(_0x3551('0x26'))]();
//     } else {
//         _0x372cc0 = new ActiveObject(_0x3551('0x27'));
//     }
//     _0x372cc0[_0x3551('0x28')](_0x3551('0x29'), _0x3551('0x2a'), ![]);
//     _0x372cc0[_0x3551('0x2b')](null);
//     var _0x309ac8 = _0x372cc0[_0x3551('0x2c')]('Date');
//     var _0x150c4d = new Date(_0x309ac8[_0x3551('0x2d')](/-/g, '/'));
//     var _0x4e6d5d = Date[_0x3551('0x2e')](_0x150c4d);
//     return _0x4e6d5d / 0x3e8;
// }

// 本地改写
function _0x89ea42() {
    var _0x150c4d = new Date();
    var _0x4e6d5d = Date.parse(_0x150c4d);
    return _0x4e6d5d / 0x3e8;
}

Local test OK:

12

\_\_lg_stoken\_\_

The parameter __lg_stoken__ is generated after clicking search. Direct search is also worthless. Hook it, follow the stack up, and it is easy to find the generation location:

13

14

You can see that d is the value of __lg_stoken__ , d = (new g()).a() , g = window.gt , window.gt actually call _0x11db59

Follow up the obfuscated JS and look at it, and you will find that the code at the end is the key. The prototype prototype object is used here. We can get __lg_stoken__ directly by window.gt.prototype.a() or (new window.gt).a() , as shown in the following figure:

15

At this point, you may want to set a breakpoint to debug to see if you can deduct a logic, but you will find that it cannot be broken after refreshing, because this obfuscated JS file is always changing, and the previous breakpoint will not work. Then you may think of directly replacing the JS, and fixing the file name, and you can debug the breakpoint. If you do this, you will find that it is always loading when you refresh it. When you open the console, you will find that an error is reported, causing this The reason is that this obfuscated JS will not only change the file name, but also its content. Of course, the content is not just as simple as changing the variable name, some values are also dynamically changed, such as:

16

Here we don't care so much, just copy all the obfuscated code, first debug it locally, and see if it can run. During the debugging process, it will prompt window is not defined and Cannot read properties of undefined (reading 'hostname') , locate the code, and there is an operation to take window.location.hostname , just define it locally:

17

Debugging again will report an error of Cannot read properties of undefined (reading 'substr') . substr() method can extract the specified number of characters starting from the specified subscript in the string. It is a method of the string object stringObject. We locate the code and find that the window.location.search object calls the substr() method. So in the same way, we have to make up for it locally.

18

After the parameters are filled locally, the running result is consistent with the webpage:

19

The execution result is no problem, then there is another problem, the value of window.location.search is the parameter to be encrypted, where did it come from? When we search directly, we can see that it is the address of an interface 302 jump. When using it, just take it directly. This interface is composed of your search content. Searching for different parameters, this jump address is also different:

20

After the debugging is successful, we can change a search keyword at random, take the obtained 302 jump address into this JS, encrypt it, and find that an error will be reported, which means that the parameters passed in by confusing JS should correspond to the JS content, here The method is to directly request to get the content of the JS file, then add the window to be supplemented and the method to obtain __lg_stoken__ , and then execute it directly.

The key code to obtain __lg_stoken__ is as follows ( original_data is the original search data):

def get_lg_stoken(original_data: dict) -> str:
    # 获取 cookie 中的 __lg_stoken__
    token_url = "https://www.脱敏处理.com/wn/jobs"
    token_headers = {
        "Host": "www.脱敏处理.com",
        "Referer": "https://www.脱敏处理.com/",
        "User-Agent": UA
    }
    params = {
        "kd": original_data["kd"],
        "city": original_data["city"]
    }
    token_response = requests.get(url=token_url, params=params, headers=token_headers, cookies=global_cookies, allow_redirects=False)
    if token_response.status_code != 302:
        raise Exception("获取跳转链接异常!检查 global_cookies 是否已包含 __lg_stoken__!")
    # 获取 302 跳转的地址
    security_check_url = token_response.headers["Location"]
    if "login" in security_check_url:
        raise Exception("IP 被关进小黑屋啦!需要登录!请补全登录后的 Cookie,或者自行添加代理!")
    parse_result = parse.urlparse(security_check_url)
    # url 的参数为待加密对象
    security_check_params = parse_result.query
    # 取 name 参数,为混淆 js 的文件名
    security_check_js_name = parse.parse_qs(security_check_params)["name"][0]

    # 发送请求,获取混淆的 js
    js_url = "https://www.脱敏处理.com/common-sec/dist/" + security_check_js_name + ".js"
    js_headers = {
        "Host": "www.脱敏处理.com",
        "Referer": security_check_url,
        "User-Agent": UA
    }
    js_response = requests.get(url=js_url, headers=js_headers, cookies=global_cookies).text
    # 补全 js,添加 window 参数和一个方法,用于获取 __lg_stoken__ 的值
    lg_js = """
    window = {
        "location": {
            "hostname": "www.脱敏处理.com",
            "search": '?%s'
        }
    }
    function getLgStoken(){
        return window.gt.prototype.a()
    }
    """ % security_check_params + js_response

    lg_stoken = execjs.compile(lg_js).call("getLgStoken")
    return lg_stoken

request header parameters

There are many request header parameters, including traceparent , X-K-HEADER , X-S-HEADER , X-SS-REQ-HEADER , x-anit-forge-code , x-anit-forge-token , and the last two parameters of x-anit are only after logging in in the actual test. Row. But let's analyze it.

x-anit-forge-code / x-anit-forge-token

These two values are generated by clicking search for the first time. When accessing the search interface for the first time, the returned HTML is mixed with a JSON file. The values of submitCode and submitToken are the values of x-anit-forge-code and x-anit-forge-token , as shown in the following figure:

21

When requesting this interface, pay attention to bring the cookies after login. There are only four useful values. The correct cookies are similar to:

cookies = {
    "login": "true",
    "gate_login_token": "54a31e93aa904a6bb9731bxxxxxxxxxxxxxx",
    "_putrc": "9550E53D830BE8xxxxxxxxxxxxxx",
    "JSESSIONID": "ABAAAECABIEACCA79BFxxxxxxxxxxxxxx"
}

Note that JSESSIONID will be available even if you do not log in, but you should carry this value when logging in to perform an activation operation. If the submitCode and submitToken obtained by your request are empty, then it is possible that JSESSIONID is invalid. All the above values must be Copy it after logging in!

The key code to obtain x-anit-forge-code and x-anit-forge-token is as follows ( original_data is the original search data):

def update_x_anit(original_data: dict) -> None:
    # 更新 x-anit-forge-code 和 x-anit-forge-token
    url = "https://www.脱敏处理.com/wn/jobs"
    headers = {
        "Host": "www.脱敏处理.com",
        "Referer": "https://www.脱敏处理.com/",
        "User-Agent": UA
    }
    params = {
        "kd": original_data["kd"],
        "city": original_data["city"]
    }
    response = requests.get(url=url, params=params, headers=headers, cookies=global_cookies)
    tree = etree.HTML(response.text)
    next_data_json = json.loads(tree.xpath("//script[@id='__NEXT_DATA__']/text()")[0])
    submit_code = next_data_json["props"]["tokenData"]["submitCode"]
    submit_token = next_data_json["props"]["tokenData"]["submitToken"]
    # 注意 JSESSIONID 必须是登录验证后的!
    if not submit_code or not submit_token:
        raise Exception("submitCode & submitToken 为空,请检查 JSESSIONID 是否正确!")
    global x_anit
    x_anit["x-anit-forge-code"] = submit_code
    x_anit["x-anit-forge-token"] = submit_token

traceparent

The same Hook Dafa, with the stack:

(function () {
    var org = window.XMLHttpRequest.prototype.setRequestHeader;
    window.XMLHttpRequest.prototype.setRequestHeader = function (key, value) {
        console.log('Hook 捕获到 %s 设置 -> %s', key, value);
        if (key == 'traceparent') {
            debugger;
        }
        return org.apply(this, arguments);
    };
})();

22

23

Observe the code above, ternary expression, t.sampled as true , so e value 01 , n value t.id , focusing on t.traceId and t.id , and found it difficult to adjust with the stack, the direct search keywords, you can find the location of generation:

24

25

Just deduct the E() method, and rewrite it:

getRandomValues = require('get-random-values')

function E(t) {
    for (var b = [], w = 0; w < 256; ++w)
            b[w] = (w + 256).toString(16).substr(1);
    var T = new Uint8Array(16);
    return function(t) {
        for (var e = [], n = 0; n < t.length; n++)
            e.push(b[t[n]]);
        return e.join("")
    }(getRandomValues(T)).substr(0, t)
}

function getTraceparent(){
    return "00-" + E() + "-" + E(16) + "-" + "01"
}

// 测试输出
// console.log(getTraceparent())

X-K-HEADER / X-SS-REQ-HEADER

The data of X-K-HEADER and X-SS-REQ-HEADER are the same, except that the latter is in the form of a key-value pair. First, search for the keyword directly globally. It is found that these two values are taken from the local area. After clearing the cookie, it is empty. Then search for the value directly and find that it is The secretKeyValue value returned by the agreement interface is what we want. It is possible that if the browser captures the package and searches directly, it cannot be found, but can be found by using a package capture tool, such as Fiddler, as shown in the following figure:

26

This interface is a post request. The request brings a json data, secretKeyDecode , directly search for the keyword, just a value, locate and stack:

27

zt() is fetched from the local cache, At() is regenerated:

28

It is very obvious here, t is a 32-bit random string, assigned as aesKey , followed by an RSA encrypted aesKey , assigned as rsaEncryptData , and rsaEncryptData is the secretKeyValue value requested by the previous agreement interface.

Let me talk about it first. The data and return data of the final search job request are both AES encryption and decryption. This aesKey will be used, and another parameter X-S-HEADER in the request header will also be used. If this key is not encrypted by RSA and verified through the agreement interface If so, it is invalid. It can be understood that the agreement interface is not only to obtain X-K-HEADER and X-SS-REQ-HEADER , but also to activate this aesKey .

The JS code and Python code of this part are roughly as follows:

JSEncrypt = require("jsencrypt")

function getAesKeyAndRsaEncryptData() {
    var aesKey = function (t) {
        for (var e = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=", r = "", n = 0; n < t; n++) {
            var i = Math.floor(Math.random() * e.length);
            r += e.substring(i, i + 1)
        }
        return r
    }(32);

    var e = new JSEncrypt();
    e.setPublicKey("-----BEGIN PUBLIC KEY-----MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAnbJqzIXk6qGotX5nD521Vk/24APi2qx6C+2allfix8iAfUGqx0MK3GufsQcAt/o7NO8W+qw4HPE+RBR6m7+3JVlKAF5LwYkiUJN1dh4sTj03XQ0jsnd3BYVqL/gi8iC4YXJ3aU5VUsB6skROancZJAeq95p7ehXXAJfCbLwcK+yFFeRKLvhrjZOMDvh1TsMB4exfg+h2kNUI94zu8MK3UA7v1ANjfgopaE+cpvoulg446oKOkmigmc35lv8hh34upbMmehUqB51kqk9J7p8VMI3jTDBcMC21xq5XF7oM8gmqjNsYxrT9EVK7cezYPq7trqLX1fyWgtBtJZG7WMftKwIDAQAB-----END PUBLIC KEY-----");
    var rsaEncryptData = e.encrypt(aesKey);

    return {
        "aesKey": aesKey,
        "rsaEncryptData": rsaEncryptData
    }
}

// 测试输出
// console.log(getAesKeyAndRsaEncryptData())
def update_aes_key() -> None:
    # 通过JS获取 AES Key,并通过接口激活,接口激活后会返回一个 secretKeyValue,后续请求头会用到
    global aes_key, secret_key_value
    url = "https://gate.脱敏处理.com/system/agreement"
    headers = {
        "Content-Type": "application/json",
        "Host": "gate.脱敏处理.com",
        "Origin": "https://www.脱敏处理.com",
        "Referer": "https://www.脱敏处理.com/",
        "User-Agent": UA
    }
    encrypt_data = lagou_js.call("getAesKeyAndRsaEncryptData")
    aes_key = encrypt_data["aesKey"]
    rsa_encrypt_data = encrypt_data["rsaEncryptData"]
    data = {"secretKeyDecode": rsa_encrypt_data}
    response = requests.post(url=url, headers=headers, json=data).json()
    secret_key_value = response["content"]["secretKeyValue"]

X-S-HEADER

X-S-HEADER Every time you turn the page, it will change, and you can locate by searching for keywords directly:

29

30

There is a SHA256 encryption in the middle, the final returned Rt(JSON.stringify({originHeader: JSON.stringify(e), code: t})) is the value of X-S-HEADER , Rt() is an AES encryption, which is more critical, Vt(r) is a URL, for example, if you search for a position, it is positionAjax.json, and if you search for a company, it is companyAjax.json. The situation is customized, and then Lt(t) is the search information, in the form of a string, including the city, page number, keywords, etc.

The JS code to get X-S-HEADER is roughly as follows:

CryptoJS = require('crypto-js')

jt = function(aesKey, originalData, u) {
    var e = {deviceType: 1}
      , t = "".concat(JSON.stringify(e)).concat(u).concat(JSON.stringify(originalData))
      , t = (t = t, null === (t = CryptoJS.SHA256(t).toString()) || void 0 === t ? void 0 : t.toUpperCase());

    return Rt(JSON.stringify({
        originHeader: JSON.stringify(e),
        code: t
    }), aesKey)
}

Rt = function (t, aesKey) {
    var Ot = CryptoJS.enc.Utf8.parse("c558Gq0YQK2QUlMc"),
        Dt = CryptoJS.enc.Utf8.parse(aesKey),
        t = CryptoJS.enc.Utf8.parse(t);
    t = CryptoJS.AES.encrypt(t, Dt, {
        iv: Ot,
        mode: CryptoJS.mode.CBC,
        padding: CryptoJS.pad.Pkcs7
    });
    return t.toString()
};

function getXSHeader(aesKey, originalData, u){
    return jt(aesKey, originalData, u)
}

// 测试样例
// var url = "https://www.脱敏处理.com/jobs/v2/positionAjax.json"
// var aesKey = "dgHY1qVeo/Z0yDaF5WV/EEXxYiwbr5Jt"
// var originalData = {"first": "true", "needAddtionalResult": "false", "city": "全国", "pn": "2", "kd": "Java"}
// console.log(getXSHeader(aesKey, originalData, url))

Request/return data decryption

In the previous packet capture, we have found that positionAjax.json is a POST request, the data in Form Data is encrypted, and the returned data is also encrypted. When we analyze the request header parameters, it involves AES encryption and decryption, so we directly search for AES.encrypt , AES.decrypt , debug with breakpoint:

31

32

Very obvious, the JS code of this part is roughly as follows:

CryptoJS = require('crypto-js')

function getRequestData(aesKey, originalData){
    return Rt(JSON.stringify(originalData), aesKey)
}

function getResponseData(encryptData, aesKey){
    return It(encryptData, aesKey)
}

Rt = function (t, aesKey) {
    var Ot = CryptoJS.enc.Utf8.parse("c558Gq0YQK2QUlMc"),
        Dt = CryptoJS.enc.Utf8.parse(aesKey),
        t = CryptoJS.enc.Utf8.parse(t);
    t = CryptoJS.AES.encrypt(t, Dt, {
        iv: Ot,
        mode: CryptoJS.mode.CBC,
        padding: CryptoJS.pad.Pkcs7
    });
    return t.toString()
};

It = function(t, aesKey) {
    var Ot = CryptoJS.enc.Utf8.parse("c558Gq0YQK2QUlMc"),
    Dt = CryptoJS.enc.Utf8.parse(aesKey);
    t = CryptoJS.AES.decrypt(t, Dt, {
        iv: Ot,
        mode: CryptoJS.mode.CBC,
        padding: CryptoJS.pad.Pkcs7
    }).toString(CryptoJS.enc.Utf8);
    try {
        t = JSON.parse(t)
    } catch (t) {}
    return t
}

// 测试样例,注意,encryptedData 数据太多,省略了,直接运行解密是会报错的
// var aesKey = "dgHY1qVeo/Z0yDaF5WV/EEXxYiwbr5Jt"
// var encryptedData = "r4MqbduYxu3Z9sFL75xDhelMTCYPHLluKaurYgzEXlEQ1Rg......"
// var originalData = {"first": "true", "needAddtionalResult": "false", "city": "全国", "pn": "2", "kd": "Java"}
// console.log(getRequestData(aesKey, originalData))
// console.log(getResponseData(encryptedData, aesKey))

The rough Python code is as follows:

def get_header_params(original_data: dict) -> dict:
    # 后续请求数据所需的请求头参数
    # 职位搜索 URL,如果是搜索公司,那就是 https://www.脱敏处理.com/jobs/companyAjax.json,根据实际情况更改
    u = "https://www.脱敏处理.com/jobs/v2/positionAjax.json"
    return {
        "traceparent": lagou_js.call("getTraceparent"),
        "X-K-HEADER": secret_key_value,
        "X-S-HEADER": lagou_js.call("getXSHeader", aes_key, original_data, u),
        "X-SS-REQ-HEADER": json.dumps({"secret": secret_key_value})
    }


def get_encrypted_data(original_data: dict) -> str:
    # AES 加密原始数据
    encrypted_data = lagou_js.call("getRequestData", aes_key, original_data)
    return encrypted_data


def get_data(original_data: dict, encrypted_data: str, header_params: dict) -> dict:
    # 携带加密后的请求数据和完整请求头,拿到密文,AES 解密得到明文职位信息
    url = "https://www.脱敏处理.com/jobs/v2/positionAjax.json"
    referer = parse.urljoin("https://www.脱敏处理.com/wn/jobs?", parse.urlencode(original_data))
    headers = {
        # "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
        "Host": "www.脱敏处理.com",
        "Origin": "https://www.脱敏处理.com",
        "Referer": referer,
        "traceparent": header_params["traceparent"],
        "User-Agent": UA,
        "X-K-HEADER": header_params["X-K-HEADER"],
        "X-S-HEADER": header_params["X-S-HEADER"],
        "X-SS-REQ-HEADER": header_params["X-SS-REQ-HEADER"],
    }
    # 添加 x-anit-forge-code 和 x-anit-forge-token
    headers.update(x_anit)

    data = {"data": encrypted_data}
    response = requests.post(url=url, headers=headers, cookies=global_cookies, data=data).json()
    if "status" in response:
        if not response["status"] and "操作太频繁" in response["msg"]:
            raise Exception("获取数据失败!msg:%s!可以尝试补全登录后的 Cookies,或者添加代理!" % response["msg"])
        else:
            raise Exception("获取数据异常!请检查数据是否完整!")
    else:
        response_data = response["data"]
        decrypted_data = lagou_js.call("getResponseData", response_data, aes_key)
        return decrypted_data

Finally, all the codes are integrated and the data is successfully obtained:

33

Reverse Tips

Browser developer tools Application - Storage option, you can clear all cookies with one click, or you can customize the storage quota:

34

Storage - Cookies You can view all cookies of each site. If HttpOnly is checked, it means that it is returned by the server. Select a cookie and right-click to directly locate which request has this cookie. You can also directly edit the value, and you can delete a single cookie. This feature may be useful when you are logged in to your account but need to clear a cookie and don't want to log in again.

35

full code

Some key codes are given in the article, which cannot be run directly. Some details may not be mentioned. The complete code has been put on GitHub, with detailed comments. Welcome to Star. All content is only for learning and communication, and is strictly prohibited for commercial use and illegal use, otherwise all the consequences arising therefrom have nothing to do with the author, please delete the files downloaded in the warehouse after learning within 24 hours!

Warehouse address: https://github.com/kgepachong/crawler/

common problem

  • There are three libraries referenced in the JS code. Just install it with npm install. If the library is not found after installation, it is a path problem. It is recommended to execute the command installation in the current directory, or specify the full path in the Python code. The method can be Baidu.
  • The jsencrypt library may report an error window is not defined when running locally. Just add var window = global; to the \node_modules\jsencrypt\bin\jsencrypt.js source code. This is a library that implements RSA encryption. Of course, there are many other implementation methods or libraries.
  • When execjs executes JS, the encoding error "gbk" can't decode byte... may be reported. There are two solutions. One is to find the official source code subprocess.py, search for encoding=None and change it to encoding='utf-8' , and the other is to directly add the following code to the Python code:
import subprocess
from functools import partial

subprocess.Popen = partial(subprocess.Popen, encoding="utf-8")


K哥爬虫
161 声望136 粉丝

Python网络爬虫、JS 逆向等相关技术研究与分享。