1
头图
Pay attention to WeChat public account: Brother K crawler, continue to share advanced crawler, JS/Android reverse engineering and other technical dry goods!

statement

All the content in this article is for learning and communication only. The captured content, sensitive URLs, and data interfaces have been desensitized, and it is strictly forbidden to use them for commercial or illegal purposes. Otherwise, all consequences arising therefrom will have nothing to do with the author. Infringement, please contact me to delete it immediately!

Write in front

The topic itself is not difficult, but there are many pits, mainly anti-Hook operation and local joint debugging environment. This article will introduce each pit in detail. It is not just a brief introduction, it is written in great detail!

Through this article you will learn:

  1. Hook Function and timer to eliminate infinite debugger;
  2. Solve the anti-Hook, find the encryption parameter _signature through the Hook method;
  3. Analyze the difference between the browser and the local environment, how to find objects such as navigator, document, location, etc., and how to supplement the environment locally;
  4. How to use PyCharm for local joint debugging, locate the difference between the local and browser environment, so as to pass the detection.

Reverse target

  • Goal: Anti-crawler training platform for netizens. The first question: JS obfuscates encryption and anti-Hook operation
  • Link: http://spider.wangluozhe.com/challenge/1
  • Introduction: The answer to this question is to add all the data of 100 pages. It is required to complete this question in the way of hooks. Do not use AST, deduction of codes, etc. to solve it, and do not use JS anti-obfuscation tools for decryption. (The writing and usage of Hook code, K brother has the previous article, this article will not introduce it in detail)

01.png

Bypass unlimited debugger

First observe that when you click to turn the page, the URL does not change, so it is usually an Ajax request. Some parameters will change every time you request. If you are proficient in pressing F12 to find the encrypted parameters, you will find that it immediately stops and enters the infinite debugger state. With a stack, you can find the word debugger, as shown in the figure below:

02.png

This situation is also seen in the previous case of Brother K. At that time, we rewrite the JS directly and replace the word debugger. But this question obviously hopes that we can use the Hook method to pass the infinite debugger, except for the debugger. , We noticed that there is a constructor in the front. It is called a constructor in JavaScript. It is usually called when an object is created or instantiated. Its basic syntax is: constructor([arguments]) { ... } . For details, please refer to MDN Construction Method , in this In the case, it is obvious that the debugger is the arguments parameter of the constructor, so we can write the following Hook code to pass the infinite debugger:

// 先保留原 constructor
Function.prototype.constructor_ = Function.prototype.constructor;
Function.prototype.constructor = function (a) {
    // 如果参数为 debugger,就返回空方法
    if(a == "debugger") {
        return function (){};
    }
    // 如果参数不为 debugger,还是返回原方法
    return Function.prototype.constructor_(a);
};

There are many ways to inject Hook code, such as directly entering the code in the browser developer tool console (refreshing the web page will fail), Fiddler plug-in injection, oil monkey plug-in injection, self-written browser plug-in injection, etc. These methods are before Brother K All articles are introduced, so I won’t repeat them today.

This time we use the Fiddler plug-in injection. After injecting the above Hook code, you will find that you will enter the infinite debugger again, setInterval, obviously the timer, he has two necessary parameters, the first is the method to be executed, and the second It is the time parameter, that is, the time interval for periodically calling the method, in milliseconds. For details, please refer to rookie tutorial Window setInterval() . Similarly, we can also hook it off:

// 先保留原定时器
var setInterval_ = setInterval
setInterval = function (func, time){
    // 如果时间参数为 0x7d0,就返回空方法
    // 当然也可以不判断,直接返回空,有很多种写法
    if(time == 0x7d0)
    {
        return function () {};
    }
    // 如果时间参数不为 0x7d0,还是返回原方法
    return setInterval_(func, time)
}

03.png

Paste the two pieces of Hook code into the browser plug-in, turn on the Hook, and refresh the page again, and you will find that the infinite debugger has passed.

04.png

Hook parameters

After the unlimited debugger, we click on a page randomly, and the packet capture can see that it is a POST request. In Form Data, page is the number of pages, count is the amount of data per page, and _signature is the parameter we want to reverse, as shown in the figure below :

05.png

We directly searched for _signature , and there was only one result. One of the window.get_sign() is to set _signature , as shown in the following figure:

06.png

Here comes the problem! ! ! Let's look at the topic of this question again, JS obfuscates encryption, anti-Hook operation, the author has repeatedly emphasized that this question is a test of Hook ability! And so far, we seem to have not encountered any anti-Hook means, so it _signature directly. It must be obtained through Hook to get _signature , and the subsequent Hook operation will definitely not be smooth sailing!

window._signature much to say, we directly write a code for Hook 061b8641e969b5, as shown below:

(function() {
    //严谨模式 检查所有错误
    'use strict';
    //window 为要 hook 的对象,这里是 hook 的 _signature
    var _signatureTemp = "";
    Object.defineProperty(window, '_signature', {
        //hook set 方法也就是赋值的方法 
        set: function(val) {
                console.log('Hook 捕获到 _signature 设置->', val);
                debugger;
                _signatureTemp = val;
                return val;
        },
        //hook get 方法也就是取值的方法 
        get: function()
        {
            return _signatureTemp;
        }
    });
})();

The two bypasses of Hook unlimited debugger code and the Hook _signature code together, use Fiddler plug with injection (Note here that should bypass the code in the debugger Hook _signature code behind, or they may not work, this It may be a bug of the plug-in), refresh the web page again, you can find that the buttons on the front end of the page are missing, open the developer tools, you can see that there are two errors in the upper right corner, click to jump to the wrong code, in the control The station can also see the error message, as shown in the figure below:

07.png

The entire 1.js code is obfuscated by the sojson jsjiami v6 version. We will output some obfuscated codes in the console, and then manually restore this code. There are two variables i1I1i1li and illllli1 , which seem to be strenuous, directly Use a and b instead, as shown below:

(function() {
    'use strict';
    var a = '';
    Object["defineProperty"](window, "_signature", {
        set: function(b) {
            a = b;
            return b;
        },
        get: function() {
            return a;
        }
    });
}());

Is it familiar? There are get and set methods. Isn't this the Hook window._signature operation? The whole logic is that when the set method sets _signature , it is assigned to a, and when the get method gets _signature , it returns a. This operation actually has _signature What is the meaning of this code? Why do we report an error when we add our own Hook code?

Uncaught TypeError: Cannot redefine property: _signature take a look at the error message: 061b8641e96a9c, can't redefine _signature ? Our Hook code runs Object.defineProperty(window, '_signature', {}) page is loaded, and it will report an error when the JS of the website is defineProperty again. That's very simple. Since the website’s own JS Hook code will not affect _signature , directly change it. Just delete it! This place is probably the anti-Hook operation.

Save the original 1.js to the local, delete its Hook code, use Fiddler's AutoResponder function to replace the response (there are many replacement methods, K brother's previous article also introduces), refresh again to find that the exception is _signature , and the successful Hook to 061b8641e96ab9.

08.png

09.png

Reverse parameters

After the successful Hook, directly follow the stack and expose the method directly: window._signature = window.byted_acrawler(window.sign())

10.png

Let's take a look at window.sign() first. If you select it, you can actually see that it is a 13-bit millisecond timestamp. Let's follow up with 1.js to see his implementation code:

11.png

Let's manually restore part of the obfuscated code:

window["sign"] = function sign() {
    try {
        div = document["createElement"];
        return Date["parse"](new Date())["toString"]();
    } catch (IIl1lI1i) {
        return "123456789abcdefghigklmnopqrstuvwxyz";
    }
}

We must pay attention here, there is a hole buried for us, if you skip it directly, and think that just a timestamp is not attractive, then you are very wrong! Note that this is a try-catch statement, in which there is a div = document["createElement"]; , there is an HTML DOM Document object, and a div tag is created. If this code is executed in the browser, there is no problem, just go to the try statement and return the timestamp, if When executed on our local node, document is not defined will be captured, and then the catch statement will be returned. The string of numbers plus letters is returned. The final result is definitely incorrect!

The solution is also very simple. In the local code, either remove the try-catch statement and return the timestamp directly, or define the document at the beginning, or directly comment out the line of code that creates the div tag, but Brother K recommends it directly. Define the document, because who can guarantee that there will be similar pits elsewhere? If you hide it deeply and don't find it, isn't it a waste of effort?

Then look at the window.byted_acrawler() , return statement in the main use of the sign() is window.sign() method and IIl1llI1() method, we follow up IIl1llI1() method used can see the same try-catch statement, nav = navigator[liIIIi11('2b')]; and the previous case div exactly the same, the same is also recommended here Define the navigator directly, as shown in the figure below:

14.png

15.png

The method used here is basically analyzed. After we define window, document, and navigator, run it locally, and it will prompt window[liIIIi11(...)] is not a function :

16.png

If we go to the webpage and have a look, we will find that this method is actually a timer, which does not have much effect, just comment it out:

17.png

PyCharm local joint debugging

After the above operations, if you run it locally again, it will prompt window.signs is not a function . The error is an eval statement. We went to the browser to look at this eval statement and found that it was window.sign() , why the local has become window.signs() , which is an extra s for no reason. Woolen cloth?

18.png

19.png

There is only one reason for this situation, and that is the difference between the local environment and the browser environment. There must be environment detection in the obfuscated code. If it is not the browser environment, the code in eval will be modified and an extra s is added. Here if You directly delete the entire function containing the eval statement and the setInterval timer above, and the code can also run normally, but Brother K has always pursued details! We have to figure out the reason for adding another s!

We use PyCharm to debug locally and see where the s is added. The error is the eval statement. We click on this line, next breakpoint, right-click debug to run, and enter the debugging interface (PS: the original code has Unlimited debugger, if you don’t do the processing, debugging in PyCharm will also enter the infinite debugger. You can directly add the previous Hook code to the front of the local code, or you can directly delete the corresponding function or variable):

20.png

The left side is the call stack, and the right side is the variable value. On the whole, it is similar to the developer tools in Chrome. For detailed usage, please refer to JetBrains official document , which mainly introduces the 8 buttons in the picture:

  1. Show Execution Point (Alt + F10): If your cursor is on another line or other page, click this button to jump to the line where the current breakpoint is located;
  2. Step Over (F8): Step over, go down line by line, if there is a method on this line, it will not enter the method;
  3. Step Into (F7): Step Into, if there is a method in the current line, you can enter the method. It is generally used to enter the custom method written by the user, and will not enter the method of the official class library;
  4. Force Step Into (Alt + Shift + F7): Force Step Into, you can enter any method, you can use this method to enter the official class library when viewing the underlying source code;
  5. Step Out (Shift + F8): Step out, exit from the step-in method to the method call. At this time, the method has been executed, but the assignment has not been completed;
  6. Restart Frame: Give up the current breakpoint and re-execute the breakpoint;
  7. Run to Cursor (Alt + F9): Run to the cursor, the code will run to the cursor line, no need to break points;
  8. Evaluate Expression (Alt + F8): Calculate the expression, you can run the expression directly without inputting on the command line.

When we click the Step Into button, we will enter function IIlIliii() . Here we also use the try-catch statement. If we continue to the next step, we will find that the exception is caught, prompting Cannot read property 'location' of undefined , as shown in the following figure:

21.png

Let's output the value of each variable and manually restore the code as follows:

function IIlIliii(II1, iIIiIIi1) {
    try {
        href = window["document"]["location"]["href"];
        check_screen = screen["availHeight"];
        window["code"] = "gnature = window.byted_acrawler(window.sign())";
        return '';
    } catch (I1IiI1il) {
        window["code"] = "gnature = window.byted_acrawlers(window.signs())";
        return '';
    }
}

In this way, we found the clues. We don’t have document, location, href, and availHeight objects locally, so we will use the catch statement, which becomes window.signs() , and an error will be reported. The solution here is also very simple, you can delete it directly The extra code is directly defined as the string of statements without s, or you can also choose to make up the environment, look at the values of href and screen in the browser, and define it:

var window = {
    "document": {
        "location": {
            "href": "http://spider.wangluozhe.com/challenge/1"
        }
    },
}

var screen = {
    "availHeight": 1040
}

Then run it again, and it will prompt sign is not defined , where sign() is actually window.sign() , which is the following window[liIIIi11('a')] method, just change any way of writing:

22.png

Run again, there is no error, we can write a method to get _signature : choose one of the following writing methods, you can:

function getSign(){
    return window[liIIIi11('9')](window[liIIIi11('a')]())
}

function getSign(){
    return window.byted_acrawler(window.sign())
}

// 测试输出
console.log(getSign())

Let's run it and find that there is no output in Pycharm. Similarly, we output console.log on the console of the title page and found that it is blank, as shown in the following figure:

23.png

It seems that he has also console.log . In fact, this situation is not a big problem. We can directly use the Python script to call the getSign() method we wrote before to get _signature , but, again, brother K has always pursued details of! I have to find console.log and make it normal!

Here we still use Pycharm to debug, and we are more familiar with local joint debugging. console.log(getSign()) statement and follow up step by step. We will find that we have entered the statement var IlII1li1 = function() {}; . Check the variable values at this time and find that console.log console.warn etc. are all set. Empty, as shown in the figure below:

24.png

Going to the next step, I found that it returned directly. It is possible that the console-related commands will be blanked when the JS is run for the first time, so first set a few breakpoints in the suspected console processing method, and then Re-debug, you will find that you will go to the else statement, and then directly assign IlII1li1, which is the empty method, to the console related commands, as shown in the following figure:

25.png

After locating the problem, we directly commented out the if-else statement without leaving it blank, and then debugged it again, and found that the result can be directly output:

26.png

Call Python with _signature to calculate the data of each page one by one, and the final submission is successful:

2.png

Complete code

Follow K brother crawler on GitHub and continue to share crawler-related code! Welcome star! https://github.com/kgepachong/

following 161b8641e972f4 only demonstrates part of the key code and cannot be run directly! complete code warehouse address: https://github.com/kgepachong/crawler/

JavaScript encryption key code architecture

var window = {
    "document": {
        "location": {
            "href": "http://spider.wangluozhe.com/challenge/1"
        }
    },
}

var screen = {
    "availHeight": 1040
}
var document = {}
var navigator = {}
var location = {}

// 先保留原 constructor
Function.prototype.constructor_ = Function.prototype.constructor;
Function.prototype.constructor = function (a) {
    // 如果参数为 debugger,就返回空方法
    if(a == "debugger") {
        return function (){};
    }
    // 如果参数不为 debugger,还是返回原方法
    return Function.prototype.constructor_(a);
};

// 先保留原定时器
var setInterval_ = setInterval
setInterval = function (func, time){
    // 如果时间参数为 0x7d0,就返回空方法
    // 当然也可以不判断,直接返回空,有很多种写法
    if(time == 0x7d0)
    {
        return function () {};
    }
    // 如果时间参数不为 0x7d0,还是返回原方法
    return setInterval_(func, time)
}

var iil = 'jsjiami.com.v6'
  , iiIIilii = [iil, '\x73\x65\x74\x49\x6e\x74\x65\x72\x76\x61\x6c', '\x6a\x73\x6a', ...];
var liIIIi11 = function(_0x11145e, _0x3cbe90) {
    _0x11145e = ~~'0x'['concat'](_0x11145e);
    var _0x636e4d = iiIIilii[_0x11145e];
    return _0x636e4d;
};
(function(_0x52284d, _0xfd26eb) {
    var _0x1bba22 = 0x0;
    for (_0xfd26eb = _0x52284d['shift'](_0x1bba22 >> 0x2); _0xfd26eb && _0xfd26eb !== (_0x52284d['pop'](_0x1bba22 >> 0x3) + '')['replace'](/[fnwRwdGKbwKrRFCtSC=]/g, ''); _0x1bba22++) {
        _0x1bba22 = _0x1bba22 ^ 0x661c2;
    }
}(iiIIilii, liIIIi11));
// window[liIIIi11('0')](function() {
//     var l111IlII = liIIIi11('1') + liIIIi11('2');
//     if (typeof iil == liIIIi11('3') + liIIIi11('4') || iil != l111IlII + liIIIi11('5') + l111IlII[liIIIi11('6')]) {
//         var Ilil11iI = [];
//         while (Ilil11iI[liIIIi11('6')] > -0x1) {
//             Ilil11iI[liIIIi11('7')](Ilil11iI[liIIIi11('6')] ^ 0x2);
//         }
//     }
//     iliI1lli();
// }, 0x7d0);
(function() {
    var iiIIiil = function() {}();
    var l1liii11 = function() {}();
    window[liIIIi11('9')] = function byted_acrawler() {};
    window[liIIIi11('a')] = function sign() {};
    (function() {}());
    // (function() {
    //     'use strict';
    //     var i1I1i1li = '';
    //     Object[liIIIi11('1f')](window, liIIIi11('21'), {
    //         '\x73\x65\x74': function(illllli1) {
    //             i1I1i1li = illllli1;
    //             return illllli1;
    //         },
    //         '\x67\x65\x74': function() {
    //             return i1I1i1li;
    //         }
    //     });
    // }());
    var iiil1 = 0x0;
    var l11il1l1 = '';
    var ii1Ii = 0x8;
    function i1Il11i(iiIll1i) {}
    function I1lIIlil(l11l1iIi) {}
    function lllIIiI(IIi1lIil) {}

    // 此处省略 N 个函数
    
    window[liIIIi11('37')]();
}());

function iliI1lli(lil1I1) {
    function lili11I(l11I11l1) {
        if (typeof l11I11l1 === liIIIi11('38')) {
            return function(lllI11i) {}
            [liIIIi11('39')](liIIIi11('3a'))[liIIIi11('8')](liIIIi11('3b'));
        } else {
            if (('' + l11I11l1 / l11I11l1)[liIIIi11('6')] !== 0x1 || l11I11l1 % 0x14 === 0x0) {
                (function() {
                    return !![];
                }
                [liIIIi11('39')](liIIIi11('3c') + liIIIi11('3d'))[liIIIi11('3e')](liIIIi11('3f')));
            } else {
                (function() {
                    return ![];
                }
                [liIIIi11('39')](liIIIi11('3c') + liIIIi11('3d'))[liIIIi11('8')](liIIIi11('40')));
            }
        }
        lili11I(++l11I11l1);
    }
    try {
        if (lil1I1) {
            return lili11I;
        } else {
            lili11I(0x0);
        }
    } catch (liIlI1il) {}
}
;iil = 'jsjiami.com.v6';

// function getSign(){
//     return window[liIIIi11('9')](window[liIIIi11('a')]())
// }

function getSign(){
    return window.byted_acrawler(window.sign())
}

console.log(getSign())

Python calculation key code

# ==================================
# --*-- coding: utf-8 --*--
# @Time    : 2021-12-01
# @Author  : 微信公众号:K哥爬虫
# @FileName: challenge_1.py
# @Software: PyCharm
# ==================================


import execjs
import requests

challenge_api = "http://spider.wangluozhe.com/challenge/api/1"
headers = {
    "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
    "Cookie": "将 cookie 值改为你自己的!",
    "Host": "spider.wangluozhe.com",
    "Origin": "http://spider.wangluozhe.com",
    "Referer": "http://spider.wangluozhe.com/challenge/1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
    "X-Requested-With": "XMLHttpRequest"
}


def get_signature():
    with open('challenge_1.js', 'r', encoding='utf-8') as f:
        ppdai_js = execjs.compile(f.read())
    signature = ppdai_js.call("getSign")
    print("signature: ", signature)
    return signature


def main():
    result = 0
    for page in range(1, 101):
        data = {
            "page": page,
            "count": 10,
            "_signature": get_signature()
        }
        response = requests.post(url=challenge_api, headers=headers, data=data).json()
        for d in response["data"]:
            result += d["value"]
    print("结果为: ", result)


if __name__ == '__main__':
    main()


K哥爬虫
166 声望154 粉丝

Python网络爬虫、JS 逆向等相关技术研究与分享。