1
Pay attention to WeChat public account: Brother K crawler, continue to share advanced crawler, JS/Android reverse engineering and other technical dry goods!

statement

All the content in this article is for learning and communication only. The captured content, sensitive URLs, and data interfaces have been desensitized, and it is strictly forbidden to use them for commercial or illegal purposes. Otherwise, all consequences arising therefrom will have nothing to do with the author. Infringement, please contact me to delete it immediately!

Reverse target

  • Goal: Netluo's anti-crawler training platform Question 6: JS encryption, environmental simulation detection
  • Link: http://spider.wangluozhe.com/challenge/6
  • Introduction: It is also required to collect all the numbers of 100 pages and calculate the sum of all the data. Please note! Don't reuse a parameter value, don't fool yourself!

01.png

Packet capture analysis

Through packet capture analysis, we can find that this question does not change the parameters in the Payload like the previous questions, but there is a hexin-v in the Request Headers, which will change every time a request is made. If a friend has done a certain Huashun finance crawler. , You will find that this parameter is also widely used in a Huashun site, as shown in the following figure:

02.png

03.png

Find encryption

First try to search for hexin-v directly. It only has a value in 6.js. Obviously, this JS is confused and cannot be located. Take a closer look. The entire 6.js is a self-executing function (IIFE), which is passed in The parameters of is 7 arrays, corresponding to n, t, r, e, a, u, c, as shown below:

!function (n, t, r, e, a, u, c) {
}(
    [],[],[],[],[],[],[]
);

6.js uses the element subscript when calling the value, so this confusion is also very simple, if you want to restore it, just write a script to replace the value corresponding to the array, of course, in this example It's relatively simple, so there is no need to confuse it.

Because the value of hexin-v is in the Request Headers, we can use the Hook method to capture the hexin-v value of the header and set the debugger. Repeat):

(function () {
    'use strict';
    var org = window.XMLHttpRequest.prototype.setRequestHeader;
    window.XMLHttpRequest.prototype.setRequestHeader = function (key, value) {
        if (key == 'hexin-v') {
            debugger;
        }
        return org.apply(this, arguments);
    };
})();

04.png

The next step is to follow the stack. If you go up and follow the one, you can see in 6.js that the value of h is the value we want, h = ct.update() and ct.update() actually x() , as shown in the following figure:

05.png

Continue to follow up x() , t is the value we want, t = N() :

06.png

Continue to follow up N() , et.encode(n) is the final value, you can see that there are some functions like mouse movement, click, etc.:

07.png

As we have analyzed before, 6.js is a self-executing method, and the amount of code is not a lot, so we directly define a global variable here, just export the N method, and no longer deduct one method one by one. The pseudo code is as follows:

// 定义全局变量
var Hexin;

!function (n, t, r, e, a, u, c) {
    // 省略 N 多代码
    function N() {
        S[T]++,
        S[f] = ot.serverTimeNow(),
        S[l] = ot.timeNow(),
        S[k] = zn,
        S[I] = it.getMouseMove(),
        S[_] = it.getMouseClick(),
        S[y] = it.getMouseWhell(),
        S[E] = it.getKeyDown(),
        S[A] = it.getClickPos().x,
        S[C] = it.getClickPos().y;
        var n = S.toBuffer();
        return et.encode(n)
    }
    // 将 N 方法赋值给全局变量
    Hexin = N
}(
    [],[],[],[],[],[],[]
);

// 自定义函数获取最终的 hexin-v 值
function getHexinV(){
    return Hexin()
}

Environment complement

After rewriting the above, we can debug locally, and we will find that window, document, etc. are undefined. We first define it as empty according to the previous method, and then will report error getElementsByTagName is not a function . We know that getElementsByTagName gets the object with the specified tag name. The content belongs to HTML DOM, our local node execution certainly does not have this environment.

Here we introduce a way to create a DOM environment directly in Node.js, using the jsdom library, which is officially introduced as follows:

jsdom is a pure JavaScript implementation of many web standards, especially the WHATWG DOM and HTML standards for Node.js. Generally speaking, the goal of this project is to simulate a sufficient subset of Web browsers for testing and crawling real Web applications. The latest version of jsdom requires Node.js v12 or newer. (Jsdom versions lower than v17 are still applicable to previous Node.js versions, but are not supported.) For specific usage, please refer to jsdom document .

It should be noted that jsdom also relies on canvas, so you also need to install the canvas library. The HTML canvas tag is used to dynamically draw graphics through scripts (usually JavaScript). For specific introduction and usage, please refer to the canvas document .

After we add the following code to the local JS, we have a DOM environment and can run successfully:

// var canvas = require("canvas");
var jsdom = require("jsdom");
var {JSDOM} = jsdom;
var dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);
window = dom.window;
document = window.document;
navigator = window.navigator;

With the Python code, each time a different hexin-v is carried in the request header, the data of each page is calculated one by one, and the final submission is successful:

08.png

Complete code

Follow K brother crawler on GitHub and continue to share crawler-related code! Welcome star! https://github.com/kgepachong/

following 161cc00792d296 only demonstrates part of the key code and cannot be run directly! complete code warehouse address: https://github.com/kgepachong/crawler/

JavaScript encryption key code

/* ==================================
# @Time    : 2021-12-20
# @Author  : 微信公众号:K哥爬虫
# @FileName: challenge_6.js
# @Software: PyCharm
# ================================== */


var TOKEN_SERVER_TIME = 1611313000.340;
var Hexin;
var jsdom = require("jsdom");
var {JSDOM} = jsdom;
var dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);
window = dom.window;
document = window.document;
navigator = window.navigator;

!function(n, t, r, e, a, u, c) {
    !function() {
        function Gn() {}
        var Qn = [new a[23](n[20]), new e[3](f + l + d + p)];
        function Zn() {}
        var Jn = [new t[16](c[13]), new u[9](e[19])], qn = a[24][u[16]] || a[24].getElementsByTagName(st(r[19], r[20]))[a[25]], nt;
        !function(o) {}(nt || (nt = {}));
        var tt;
        !function(o) {}(tt || (tt = {}));
        var rt = function() {}(), et;
        RT = rt
        !function(o) {}(et || (et = {}));
        function at() {}
        var ot;
        !function(o) {}(ot || (ot = {}));
        var it;
        !function(o) {}(it || (it = {}));
        var ut;
        !function(s) {}(ut || (ut = {}));
        var ct;
        !function(o) {
            function x() {}
            function L() {}
            function M() {}
            o[a[105]] = M;
            
            function N() {
                S[T]++,
                S[f] = ot.serverTimeNow(),
                S[l] = ot.timeNow(),
                S[k] = zn,
                S[I] = it.getMouseMove(),
                S[_] = it.getMouseClick(),
                S[y] = it.getMouseWhell(),
                S[E] = it.getKeyDown(),
                S[A] = it.getClickPos().x,
                S[C] = it.getClickPos().y;
                var n = S.toBuffer();
                return et.encode(n)
            }
            Hexin = N
            o[r[81]] = x
        }(ct || (ct = {}));

        function st() {}
        var vt;
        !function(o) {}(vt || (vt = {}));
        var ft;
        !function(r) {}(ft || (ft = {}))
    }()
}(
    [],[],[],[],[],[],[]
);


function getHexinV(){
    return Hexin()
}

// 测试输出
// console.log(getHexinV())

Python calculation key code

# ==================================
# --*-- coding: utf-8 --*--
# @Time    : 2021-12-20
# @Author  : 微信公众号:K哥爬虫
# @FileName: challenge_6.py
# @Software: PyCharm
# ==================================


import execjs
import requests


challenge_api = "http://spider.wangluozhe.com/challenge/api/6"
headers = {
    "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
    "Cookie": "cookie 换成你自己的!",
    "Host": "spider.wangluozhe.com",
    "Origin": "http://spider.wangluozhe.com",
    "Referer": "http://spider.wangluozhe.com/challenge/6",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
    "X-Requested-With": "XMLHttpRequest"
}


def get_hexin_v():
    with open('challenge_6.js', 'r', encoding='utf-8') as f:
        wlz_js = execjs.compile(f.read())
    hexin_v = wlz_js.call("getHexinV")
    print("hexin-v: ", hexin_v)
    return hexin_v


def main():
    result = 0
    for page in range(1, 101):
        data = {
            "page": page,
            "count": 10,
        }
        headers["hexin-v"] = get_hexin_v()
        response = requests.post(url=challenge_api, headers=headers, data=data).json()
        for d in response["data"]:
            result += d["value"]
    print("结果为: ", result)


if __name__ == '__main__':
    main()


K哥爬虫
166 声望154 粉丝

Python网络爬虫、JS 逆向等相关技术研究与分享。