头图
Pay attention to WeChat public account: Brother K crawler, continue to share advanced crawler, JS/Android reverse engineering and other technical dry goods!

statement

All the content in this article is for learning and communication only. The captured content, sensitive URLs, and data interfaces have been desensitized, and it is strictly forbidden to use them for commercial or illegal purposes. Otherwise, all consequences arising therefrom will have nothing to do with the author. Infringement, please contact me to delete it immediately!

Reverse target

  • Goal: PEDATA MAX information of SAAS system in a certain investment field, return result is encrypted
  • Homepage: aHR0cHM6Ly9tYXgucGVkYXRhLmNuL2NsaWVudC9uZXdzL25ld3NmbGFzaA==
  • Interface: aHR0cHM6Ly9tYXgucGVkYXRhLmNuL2FwaS9xNHgvbmV3c2ZsYXNoL2xpc3Q=
  • Reverse parameter: the encrypted result returned by the request, data: "L+o+YmIyNDE..."

Packet capture analysis

On the homepage, click to view all the 24-hour information, and then pull down. The information is loaded in Ajax format. We select the developer tool XHR to filter. It is easy to find a list request. The return value data is a string of encrypted characters. String, exor does not know what it is, but it may be useful later, ts is a timestamp, as shown in the following figure:

01.png

The parameters in Payload are nothing special, just some page-turning information. Let’s look at the request header. Here, pay attention to the Cookie and HTTP-X-TOKEN . To access this page requires a login account. Generally speaking, cookies are used to identify different users. , But after K brother’s test found that in this case, the HTTP-X-TOKEN parameter is used to identify the user, so no Cookie is also needed, just mention it. We often see Hm_lvt_xxx and Hm_lpvt_xxx in the Cookie. It is used for statistics of Baidu affiliate advertising and has nothing to do with crawlers.

02.png

Encryption reverse

We noticed that what is returned is a dictionary. After obtaining the encrypted data, there must be a process of obtaining values, so we directly search for the key and search for exor. There is only one result:

03.png

Here e.data is the returned dictionary, e.data.data , e.data.exor take the encrypted value and exor in turn, here you can guess that the encrypted value is taken out for the decryption operation, we also make a breakpoint at the end of this function to see the code execution After the completion, whether the value of data becomes plain text:

04.png

As expected, the Object(p["y"])(e.data.data, e.data.exor) is the decryption function, and 061d6b6a8bc37b actually calls the M method. Object(p["y"])

05.png

The incoming t and n are the encrypted value and exor respectively, and the finally returned JSON.parse(c) is the decrypted result:

06.png

Key code:

function M(t, n) {
    var a = L(Object(s["a"])(), n)
    , r = Y(B(t), a)
    , c = o.a.gunzipSync(e.from(r)).toString("utf-8");
    return JSON.parse(c)
}

One by one down the function button, simply do not talk about, which Object(s["a"]) , select it, in fact, is called c method, the method c follow-up, actually took loginToken , this loginToken that we have already analyzed the request header HTTP-X-TOKEN , Contains your login information.

Expand knowledge: window.localStorage attribute is used to store key data in the form of, in the browser localStorage and sessionStorage similar, except that: localStorage data can be long-term retention, there is no expiration date until it is manually deleted. sessionStorage is only saved in the current session and will be deleted after closing the window or tab.

07.png

Looking further down, there is a o.a.gunzipSync() , first put it, first look at the incoming parameter e.from(r) , follow-up may not see anything, directly compare r and e.from(r) , you will find that the data are all Uint8Array, exactly the same, as shown in the figure below Shown:

08.png

Let’s take a look at o.a.gunzipSync() . Actually, the anonymous function in chunk-vendors.js is called. I don’t know if this JS does not matter. We noticed that the code in chunk-vendors.js has more than 140,000 lines, plus this strange name. , What module supplier, it is not difficult to think that this is a JS generated by a system or a third party. In fact, it is a file created during the construction of a vue application. For our crawler engineers, it is roughly understood as similar to jquery. The same thing as js is fine, we generally don't deduct the code in jquery.js, and the chunk-vendors.js can't be deducted foolishly.

09.png

Let's focus on this function name, gunzipSync. I don’t know the others, but if you know zip, you can think of it as related to compression. It doesn’t matter if you don’t understand it, just use Baidu Dafa:

10.png

This directly gives the implementation method in nodejs, using the zlib module, just find an example to see the usage:

var zlib = require('zlib');
var input = "Nidhi";
var gzi = zlib.gzipSync(input);
var decom = zlib.gunzipSync(new Buffer.from(gzi)).toString();

console.log(decom);

Further study, we can know that the zlib.gunzipSync() method is the built-in application programming interface of the zlib module, which is used to decompress data blocks using Gunzip. The incoming data can be of Buffer, TypedArray, DataView, ArrayBuffer, and string types. In the official document, we can see the update history. After v8.0.0, the incoming data supports Uint8Array:

11.png

Combining our previous analysis of the r value, so in nodejs, you can directly pass the r value into the zlib.gunzipSync() method, deduct the three methods used, L, V, and B, and then rewrite it with the zlib library You can get the decompressed data:

function getDecryptedData(encryptedData, exor, loginToken) {
    var a = L(loginToken, exor);
    var r = Y(B(encryptedData), a)
    var decryptedData = zlib.gunzipSync(r).toString();
    return decryptedData
}

12.png

Complete code

Follow K brother crawler on GitHub and continue to share crawler-related code! Welcome star! https://github.com/kgepachong/

following 161d6b6a8bc711 only demonstrates part of the key code and cannot be run directly! complete code warehouse address: https://github.com/kgepachong/crawler/

JavaScript encryption code

/* ==================================
# @Time    : 2021-12-31
# @Author  : 微信公众号:K哥爬虫
# @FileName: main.js
# @Software: PyCharm
# ================================== */

var zlib = require('zlib');

function L(e, t) {
    if ("1" == t)
        return [7, 65, 75, 31, 71, 101, 57, 0];
    for (var n = [], a = 0, r = t.length; a < r; a += 2)
        n.push(e.substr(1 * t.substr(a, 2), 1).charCodeAt());
    return n
}

function Y(e, t) {
    for (var n, a = new Uint8Array(e.length), r = 0, c = e.length; r < c; r++)
        n = t[r % t.length],
            a[r] = e[r].charCodeAt() ^ n;
    return a
}

function B(e) {
    var t, n, a, r, c, u, i, o = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=", s = "", f = 0;
    e = e.replace(/[^A-Za-z0-9\+\/\=]/g, "");
    while (f < e.length)
        r = o.indexOf(e.charAt(f++)),
            c = o.indexOf(e.charAt(f++)),
            u = o.indexOf(e.charAt(f++)),
            i = o.indexOf(e.charAt(f++)),
            t = r << 2 | c >> 4,
            n = (15 & c) << 4 | u >> 2,
            a = (3 & u) << 6 | i,
            s += String.fromCharCode(t),
        64 != u && (s += String.fromCharCode(n)),
        64 != i && (s += String.fromCharCode(a));
    return s
}

function getDecryptedData(encryptedData, exor, loginToken) {
    var a = L(loginToken, exor);
    var r = Y(B(encryptedData), a)
    var decryptedData = zlib.gunzipSync(r).toString();
    return decryptedData
}

Python sample code

# ==================================
# --*-- coding: utf-8 --*--
# @Time    : 2021-12-31
# @Author  : 微信公众号:K哥爬虫
# @FileName: main.py
# @Software: PyCharm
# ==================================


import execjs
import requests

news_est_url = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
login_token = "token 换成你自己的!"
headers = {
    "Accept": "application/json, text/plain, */*",
    "Content-Type": "application/json",
    "Host": "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler",
    "HTTP-X-TOKEN": login_token,
    "Origin": "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler",
    "Referer": "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"
}


def get_decrypted_data(encrypted_data, exor):
    with open('pedata_decrypt.js', 'r', encoding='utf-8') as f:
        pedata_js = f.read()
    decrypted_data = execjs.compile(pedata_js).call('getDecryptedData', encrypted_data, exor, login_token)
    return decrypted_data


def get_encrypted_data():
    data = {
        "type": "",
        "module": "LP",
        "page":
            {
                "currentPage": 1,
                "pageSize": 10
            }
    }
    response = requests.post(url=news_est_url, headers=headers, json=data).json()
    encrypted_data, exor = response["data"], response["exor"]
    return encrypted_data, exor


def main():
    encrypted_data, exor = get_encrypted_data()
    decrypted_data = get_decrypted_data(encrypted_data, exor)
    print(decrypted_data)


if __name__ == '__main__':
    main()


K哥爬虫
166 声望148 粉丝

Python网络爬虫、JS 逆向等相关技术研究与分享。