Per capita Ruixu series, reverse analysis of Ruixu 4th generation JS

statement

All content in this article is only for learning and communication, not for any other purpose, and does not provide complete code. The content of captured packets, sensitive URLs, data interfaces, etc. have been desensitized. Commercial and illegal uses are strictly prohibited. All consequences of this have nothing to do with the author!

The author is not responsible for any accident caused by unauthorized use of the technology explained in this article. If there is any infringement, please contact the author on the official account [K Brother Crawler] to delete it immediately !

foreword

Ruishu dynamic security Botgate (robot firewall) takes "dynamic security" technology as the core, and continuously and dynamically changes the underlying code of server web pages through dynamic encapsulation, dynamic verification, dynamic obfuscation, dynamic token and other technologies, increasing the "unpredictability of server behavior". ”, which realizes all-round “active protection” from the client to the server, and provides strong security protection for various types of Web and HTML5.

Ruixu Botgate is mostly used in government, enterprise, finance, and operator industries. It was once regarded as an anti-climbing ceiling. With the increasing number of contrarian bosses in recent years, related contrarian articles have emerged one after another, and the era of per capita Ruixu has truly arrived. , I would also like to thank the reverse bigwigs such as Nanda and Lazy God for unveiling the mystery of Ruishu.

There are basically the following methods to pass the number: automated tools (to hide the characteristic value), RPC remote call, JS reverse (hard deduction code and compensation environment), this article introduces the JS reverse hard deduction code, as much as possible Introduce various details.

Ruishu features and the difference between different versions

For the vast majority of websites that use Ruixu, there are the following characteristics (there may be special versions that are different, just look at the mainstream first):

1. Open the developer tool (F12) and two typical infinite debuggers will appear in sequence:

2. In Ruishu's JS obfuscated code, most of the variables and method names are similar to _$xx , and there are many if-else control flow, the new version of Ruishu may also have jsvmp and many trinoculars In the case of expressions:

3. Looking at the request, there will be three typical requests. The response code of the first request is 202 (Rui 3, 4) or 412 (Rui 5), then a JS file is requested separately, and then the page is re-requested. In other XHR requests, there is a suffix. The value of this suffix is generated by JS and will change every time. The first number of the suffix value is the version of the Rui number, such as MmEwMD=4xxxxx is the 4th generation The Rui number, bX3Xf9nD=5xxxxx is the 5th generation Rui number:

4. Look at the cookie. There are two cookies ending with T and S in the 3rd and 4th generation of Ruishu. The cookie starting with S is returned by the first 201 request, and the cookie starting with T is generated by JS. Dynamically, T and S are usually preceded by a number of 80 or 443, and the first number of the cookie value is the version of the Rui number (why can the version be judged by the first number? Is the first number of the same version unchanged? ? These questions we can find answers when analyzing JS), such as:

FSSBBIl1UgzbN7N80T=37Na97B.nWX3.... : The number 80 is the default port number of the http protocol, corresponding to the http request.
FSSBBIl1UgzbN7N443T=4a.tr1kEXk..... : The number 443 is the default port number of the https protocol, corresponding to the https request.

The 5th generation of Rui Shu also has two cookies that end with T and S, but some special 5-generation Rui numbers also end with O and P. Similarly, the first 412 request returns the one that starts with O. The ones starting with P are generated by JS, and the first number of the cookie value is also the version of the Swiss number. Unlike the 3rd and 4th generations, the 5th generation does not have a port number, for example:

vsKWUwn3HsfIO=57C6DwDUXS..... : It ends with O, and the first digit of its value is 5, which means 5 generations of Rui number;
WvY7XhIMu0fGT=53.9fybty...... : It ends with T, and the first digit of its value is 5, which means 5-generation Rui number.

5. Looking at the entrance, Ruishu has a process to load 1w+ lines of code in the virtual machine VM. The entry for loading this code is different in different versions (where is this entry specific? How to locate it? More details in the subsequent reverse analysis. introduction), examples are as follows:

3 代： _$aW = _$c6[_$l6()](_$wc, _$mo); ， _$c6 eval ， _$l6() call ；

4 代： ret = _$DG.call(_$6a, _$YK); ， _$DG实际上是eval ，有ret ， call ；

5th generation: There are many types of 5th generation, originally similar to 4th generation, such as ret = _$Yg.call(_$kc, _$mH); , with keyword ret, call is plaintext, and there are versions without ret keyword, such as _$ap = _$j5.call(_$_T, _$gp); ，也有像3 代那样全部混淆了的，比如： _$x8 = _$mP[_$nU[15]](_$z3, _$Ec); ， _$mP eval ， _$nU[15]是call , confusing call and the difference between the 3rd generation is that the 5th generation is obtained by taking the value in an array;

Of course, if you want to accurately distinguish different versions, you have to combine all the conditions. The most important thing is to look at the internal implementation logic and the code structure of the page. For example, the 4th generation has a step to generate fake cookies, but the 5th generation does not. Although the special version seems to be the 5th generation, it adds jsvmp and trinocular expressions, which is different from the traditional 5th generation. Occasionally, a new version suddenly comes on April Fool's Day, and it will be different. Each version is analyzed again. After that, it's easy to tell the difference.

Cookie entry targeting

In this case, the 4th generation website of Ruishu is: aHR0cDovL3d3dy5mYW5nZGkuY29tLmNuL25ld19ob3VzZS9uZXdfaG91c2VfZGV0YWlsLmh0bWw=

First, pass the infinite debugger (it doesn't matter if it passes, and the subsequent analysis actually has no effect), directly right-click Never pause here and never break it here:

To locate cookies, Hooks are the fastest to come, and the following Hook codes are injected through Fiddler and other packet capture tools, oil monkey scripts, browser plug-ins, etc.:

 (function() {
    // 严谨模式 检查所有错误
    'use strict';
    // document 为要hook的对象 这里是hook的cookie
    var cookieTemp = "";
    Object.defineProperty(document, 'cookie', {
        // hook set方法也就是赋值的方法 
        set: function(val) {
                // 这样就可以快速给下面这个代码行下断点
                // 从而快速定位设置cookie的代码
                console.log('Hook捕获到cookie设置->', val);
                debugger;
                cookieTemp = val;
                return val;
        },
        // hook get 方法也就是取值的方法 
        get: function()
        {
            return cookieTemp;
        }
    });
})();

Hook found that there will be two cases of cookie generation. After breaking it, go up the stack, and you can see the code for assembling the cookie, which is similar to the following structure:

Carefully observe the place where the two cookies are generated, and follow the stack up respectively, you will find that the two cookies are obtained by two different methods, as shown in the following figure:

The code here exists in the VM virtual machine and is IIFE self-executing code. We have to follow the stack to see where the VM code is loaded from, and follow the stack to the home page (page 202) with call :

The position we introduced at the beginning of the article is analyzed in this way. This position is usually used as the entry when analyzing the Rui number. In the figure _$te is actually the eval method, the first parameter passed _$fY is the Window object, and the second object _$F8 is the IIFE self-executing code in the VM virtual machine we saw earlier.

After knowing the approximate entry of Ruixi, we can also use the Script breakpoint in the event monitoring, go to the 202 page until the next breakpoint (F8), and then search for the call keyword to quickly locate the entry, Script There are two options in the breakpoint, the first means to break when the first statement of the JS script is run, and the second means to break when the JS is blocked due to the content security policy. Generally, you can choose the first one, as follows As shown in the figure:

File Structure and Logic

To analyze the generation of cookies later, we have to observe the code of the 202 page. The meta tag has content content, which refers to a JS file similar to c.FxJzG50F.dfe1675.js , followed by a self-executing JS, as follows As shown in the figure:

The content content of the meta tag in the first part changes every time. The external JS referenced in the second part is also different in different pages, but the content in the JS of the same website and the same page is generally fixed and will not change. In the third part, the self-executing code only changes the variable name every time, and the overall logic remains unchanged. When we deduct the code later, we will also use some of the methods here. There are also a lot of self-executing codes if-else control flow. The array at the beginning, such as the one in the above figure _$Dk is used to control the subsequent control flow.

The quoted c.FxJzG50F.dfe1675.js is garbled when opened directly, and the main function of self-executing JS is to restore the JS garbled code to 1w+ lines of normal code in the VM, and define a global variable window.$_ts assigned many values, this variable plays a very important role in the subsequent VM, and the content content of the meta tag will also be used in the VM.

Since many values and variables are dynamically changed, it is definitely not conducive to our analysis, so we need to fix a set of code to the local, it will be more convenient to break the point and follow the stack, just save a copy of the code of page 202, and the page Corresponding external link JS files, such as c.FxJzG50F.dfe1675.js to the local, use the overrides rewriting function that comes with the browser, or the browser plug-in ReRes, or the response replacement function of the packet capture tool (such as Fiddler's AutoResponder) to carry out replace.

The code in the VM is the main code for generating cookies, including many if-else control flow, which undoubtedly increases the cost of our code analysis. Here we can use AST technology to do some de-obfuscation. For example, Nanda if-else The control flow is converted into switch-case , and the code under the same control flow is placed in the same case , and then in the call that b69d4f6 entry Locally, replace the VM code locally. For details, please refer to Nanda's article: "Logical Analysis of 4 Generations of Certain Numbers" . If you are interested, you can try it. If you don't know AST, you can read the previous article "Reverse Advancement, Using AST" "Technically restore JavaScript obfuscated code" , Brother K will write the actual combat of AST restoration of Ruishu code when there is time in the future. In this article, we choose to be hard!

VM code and $_ts variable access

Earlier we learned about the importance of VM code and $_ts , so our first step is to find a way to get them. As for when it will be useful, we will talk about it later in the article. Copy the external chain JS, that c.FxJzG50F.dfe1675.js The code of c.FxJzG50F.dfe1675.js and the self-executing code of the 202 page are transferred to the file, which can be run directly locally. It is necessary to lightly make up the environment, and make up for what is missing. Roughly fill in the window, location, and document, and fill in the specific content. You can use the copy() command directly in the browser console to copy the command, and then we can get the VM code directly by Hook eval. The approximate complementary environment code is as follows:

 var eval_js = ""

window = {
    $_ts:{},
    eval:function (data) {
        eval_js = data
    }
}

location = {
    "ancestorOrigins": {},
    "href": "http://www.脱敏处理.com.cn/new_house/new_house_detail.html",
    "origin": "http://www.脱敏处理.com.cn",
    "protocol": "http:",
    "host": "www.脱敏处理.com.cn",
    "hostname": "www.脱敏处理.com.cn",
    "port": "",
    "pathname": "/new_house/new_house_detail.html",
    "search": "",
    "hash": ""
}

document = {
    "scripts": ["script", "script"]
}

Observe that the key and value of $_ts are the same as those obtained in the browser:

Matters needing attention: c.FxJzG50F.dfe1675.js External JS If you download it directly and open it with an editor, it may be automatically encoded, which may be different from the original data, resulting in an error. Copy it over, or copy the response content in the packet capture software, and observe the following two situations. The first situation may cause an error, and the second is normal:

deduction code

So much has been said before, and now I can finally enter the topic, that is, buckle the code, find a good chair, and prepare to sit on the buttocks. At this time, your keyboard is only useful for F11. Continuous single-step debugging, only need 100 million details, just Done!

There are too many steps to deduct the code, it is impossible to write a screenshot of each step, and only write down the more important ones. If there is any missing place, there is no way. First, in the 202 page we replaced, manually start from the place where the code is executed. Add a debugger and break it as soon as you enter the page to facilitate subsequent analysis:

Through our previous analysis, we already know that the entry is where the call is, so we can quickly search and set the breakpoint:

Through our previous analysis, we also know that there are two places where cookies are generated. Quick search (5) , the second search result is the entry:

Fake Cookie Generation Logic

First, follow the fake cookie in a single step. Although it is fake, it will be used in the subsequent generation of the real cookie. When you follow it, you will come to this logic:

有_$8e()方法， _$8e = _$Q9 ， _$Q9又嵌套在_$d0里的，搜索一下哪里调用了_$d0 , found the beginning of the code:

So what is the incoming parameter _$Wn ? Single-step follow-in is a method, the function is to take the content of the 202 page, then we directly delete this _$Wn method locally, and directly pass in the value of the content, as shown in the following figure:

In addition, we found that there are many cases where the code takes values by index in the array. For example, the value of _$PV[68] in the above figure is actually the string content. Obviously, we need to set the value of this array. The source is found, direct search _$PV = , you can find the suspected definition and assignment place:

So we have to look at this _$iL method, pass in a very long string, break the point and go in to see, and sure enough, _$PV is generated, which is a 725-bit array:

Next, in the process of deducting the code, you will often encounter a variable, in this article it is _$sX :

Are you familiar with it? This value is the $_ts variable we got earlier. At the beginning, we can see that window.$_ts is assigned to _$sX :

Continue to walk, you will come to the following logic:

Here we will encounter six arrays, all of them already have values, so we have to find out where they came from. Searching any of the array names will find the place of definition and assignment:

The assignment obviously calls the _$rv method, and then searches for the _$rv method, and finds that it is called at the beginning:

There is nothing special in the follow-up, it has been single step, and finally there is a join('') operation, which generates a fake cookie:

接下来是生成Cookie 的名字FSSBBIl1UgzbN7N80T ，然后将Cookie document.cookie ， localStorage $_ck赋了个value, the content of localStorage can be copied directly without much impact.

True Cookie Generation Logic

Step by step with the real cookie, which is _$ZN(768, 1); in this article, and you can see that it starts to enter the endless if-else control flow:

What should be done locally here? My approach is to name the function with _$Hn and its value, function _$Hn768(){} means all the methods of taking control flow No. 768, continue to follow, the method of generating true cookies is basically controlled by No. 747 In the following, we mainly look at the steps of the control flow No. 747. The code deducted from the control flow No. 747 is roughly as follows:

Fake cookies

Single-step with control flow No. 747, there will be a step to enter control flow No. 709, which will take the previously generated fake cookie, and return an array after a series of operations:

At this point, the code we have synchronously deducted locally, if it is normal, the returned array should be the same (the subsequent data will be different, there are some parameters such as timestamp involved in the operation):

Automated tool detection

Continue to follow the control flow No. 747, it will enter the control flow No. 268, and then enter the control flow No. 154. There will be some tests for automated tools, as shown in the following figure:

A variable _$iL is defined here. If the test fails, it is 1. Afterwards, this variable is assigned to _$aW , so we keep the same locally, and it is also false (in fact, we do not need to For automated tools, this section of detection does not need to return false directly):

20-bit core array

Continue to follow the control flow No. 268, it will enter the control flow No. 668. The control flow No. 668 has two operations, one is to generate a 16-bit array, the other is to take the 4 variables in $_ts and add them to the front The last 4 bits of the 20-bit array are the core of Ruixi, and the mapping relationship is wrong, and the request cannot be passed. In the fifth generation, the processing logic of this part will be more complicated.

This is not simply taking the key-value pair in $_ts . When you deduct the code, you may find out how when you get the value here locally, what is taken out is not a number, but a string? Like the following situation:

In fact, the value of $_ts that we got at the beginning has been processed twice. We take the first _$sX._$Xb as an example, and directly search for _$sX._$Xb , you can Found this place:

Obviously, _$sX._$Xb has been reassigned. We can see that the right side of the equal sign is first taken once _$sX._$Xb , and its value is _$Rm , which is the same as our initial $_ts the corresponding values in it are the same, and then we have to look again _$sX["_$Rm"] what is sacred, directly searched and found that a method was assigned at the beginning, and generated by calling this method new value:

The other three values are the same routine, and the assignment codes are:

 _$sX._$Xb = _$sX[_$sX._$Xb](_$BH, _$DP);
_$sX._$oI = _$sX[_$sX._$oI](_$ZJ, _$DS)
_$sX._$EN = _$sX[_$sX._$EN]();
_$sX._$D9 = _$sX[_$sX._$D9](_$iL);

It should actually be:

 _$sX._$Xb = _$sX["_$Rm"](_$BH, _$DP);
_$sX._$oI = _$sX["_$Nw"](_$ZJ, _$DS)
_$sX._$EN = _$sX["_$Uh"]();
_$sX._$D9 = _$sX["_$ci"](_$iL);

Going a step further, it's actually:

 _$sX._$Xb = _$1k(_$BH, _$DP);
_$sX._$oI = _$jH(_$ZJ, _$DS)
_$sX._$EN = _$9M();
_$sX._$D9 = _$oL(_$iL);

Static analysis is no problem, we can fix it first, but these values are dynamic in practical applications, so what should we do? Let's look at a few more comparisons to find the rules:

It can be found that the corresponding rank is different each time, but in fact, the method points in the same position are the same, that is to say, only the method name and variable name are changed, and the logic of the implementation is unchanged, so we only need to know By knowing the corresponding positions of these four values, we can get the correct value. Locally, we can do this:

1. First use the regular pattern to match these four values, such as: [_$sX._$Xb, _$sX._$oI, _$sX._$EN, _$sX._$D9] ;

2. Then match the 20 assignment statements at the beginning of the VM code, such as: _$sX._$RH = _$wI; _$sX._$i5 = _$n5; etc.;

3. Then take the values corresponding to these four values through $_ts , which is equivalent to: _$sX._$Xb = _$ts._$Xb = _$Rm ; and then find the positions of the methods defined by these four values in the 20 assignment statements , which is equivalent to: lookup _$sX._$Rm = _$1k; at position 7 in the 20 assignment statements (index starts at 0)

4. We know the positions of these four methods in the 20 assignment statements, then we can directly match the name of the local corresponding position and perform dynamic replacement. Of course, the premise is that we have deducted a set of codes locally:

After such processing, the accuracy of these four values can be guaranteed.

Other places where the value of $_ts is used

In addition to the 4 $_ts values used in the 20-bit array mentioned above, there are 7 values used in other places, which can be located directly by searching. These 7 values are relatively simple. The second is the fixed value of the 2nd, 3rd, 4th, 15th, 16th, 17th, and 19th bits in $_ts . Similarly, find the corresponding position and perform dynamic replacement:

Precautions

Pay special attention to the beginning of the VM code, which will directly call and execute some methods. The values of some variables are generated by these methods. When you follow step by step and find that some parameters are wrong or not, then you have to pay attention to these methods at the beginning. It may have been generated from the beginning.

Suffix MmEwMD generation logic

All subsequent XHR requests have a suffix. The value of this suffix is also generated by JS and will change every time. Of course, the suffix name may not be the same for different websites. In this example, it is MmEwMD , the next XHR breakpoint first, when the XHR request contains MmEwMD= , it will break, and then refresh the web page:

You can see that the URL passed in l.open() is still normal. After breaking it, the URL to l.send() has a suffix, and then look at l.open() is actually xhr.open() , it is obviously different from the normal one. The same method is also in the VM code. It should be a rewritten method, which can be compared with the normal one:

Follow to the VM code to see, after the _$sd(arguments[1]) method becomes a complete link with a suffix:

Follow up _$sd method, the front is to do some processing on the url, and then there is a process to enter the control flow No. 779, which is actually the original step of generating a cookie, just follow it.

Take Advantage of Watch Tracking

The Watch function of the developer tool can continuously track the value of a variable. For such a situation where there are many control flows, setting the corresponding variable tracking can let you know which control flow you are in now, and the changes in the generated array, whether it is not. I don't know which step to follow.

Result verification

If there is no problem with the whole process, the code is buckled correctly, and if you carry the correct cookie and correct suffix, you can successfully access:

Per capita Ruixu series, reverse analysis of Ruixu 4th generation JS

statement

foreword

Ruishu features and the difference between different versions

Cookie entry targeting

File Structure and Logic

VM code and $_ts variable access

deduction code

Fake Cookie Generation Logic

True Cookie Generation Logic

Fake cookies

Automated tool detection

20-bit core array

Other places where the value of $_ts is used

Precautions

Suffix MmEwMD generation logic

Take Advantage of Watch Tracking

Result verification

K哥爬虫

引用和评论

【APP 逆向百例】淘某热点 APP 逆向分析

Python 与 PostgreSQL 集成：深入 psycopg2 的应用与实践

Anaconda安装教程以及Anaconda和pip配置国内镜像

如何减少跨团队交付摩擦？——基于 DevOps 与敏捷的最佳实践

pip安装报错：No such file or directory 'conda-forge' 没有那个文件或目录

Python 描述符

使用 chardet 判断文件编码需要注意的坑——过大的文件会导致高耗时