statement

All content in this article is only for learning and communication, not for any other purpose, and does not provide complete code. The content of captured packets, sensitive URLs, data interfaces, etc. have been desensitized. Commercial and illegal uses are strictly prohibited. All consequences of this have nothing to do with the author!

The author is not responsible for any accident caused by unauthorized use of the technology explained in this article. If there is any infringement, please contact the author on the official account [K Brother Crawler] to delete it immediately !

foreword

01

Ruishu dynamic security Botgate (robot firewall) takes "dynamic security" technology as the core, and continuously and dynamically changes the underlying code of server web pages through dynamic encapsulation, dynamic verification, dynamic obfuscation, dynamic token and other technologies, increasing the "unpredictability of server behavior". ”, which realizes all-round “active protection” from the client to the server, and provides strong security protection for various types of Web and HTML5.

In Brother K's previous article "Per capita Ruishu series, Ruishu 4th generation JS reverse analysis" , the characteristics of Ruishu, how to distinguish different versions, the code structure of Ruishu and their respective functions are introduced in detail. I repeat, those who don't understand can go to the previous article first.

Cookie entry targeting

In this case, the 5th generation website of Ruishu is: aHR0cHM6Ly93d3cubm1wYS5nb3YuY24vZGF0YXNlYXJjaC9ob21lLWluZGV4Lmh0bWw=

To locate cookies, Hook is the fastest one, and the following Hook code is injected through Fiddler plug-in, oil monkey script, browser plug-in, etc.:

 (function() {
    // 严谨模式 检查所有错误
    'use strict';
    // document 为要hook的对象 这里是hook的cookie
    var cookieTemp = "";
    Object.defineProperty(document, 'cookie', {
        // hook set方法也就是赋值的方法 
        set: function(val) {
                // 这样就可以快速给下面这个代码行下断点
                // 从而快速定位设置cookie的代码
                console.log('Hook捕获到cookie设置->', val);
                debugger;
                cookieTemp = val;
                return val;
        },
        // hook get 方法也就是取值的方法 
        get: function()
        {
            return cookieTemp;
        }
    });
})();

After breaking down and following the stack up, you can see the code assigned to document.cookie after assembling the cookie, similar to the following structure:

02

Continue to go up the stack, similar to the 4th generation Rui number, the location of (772, 1) is the entrance, the 4th generation has a process of generating a fake cookie, and the 5th generation does not, as shown in the following figure:

03

Follow the stack and come to the home page code, here is the call position we are familiar with, the picture _$ug is actually the eval method, the first parameter passed in _$Cs is The Window object, the second object _$Dm is the IIFE self-executing code in the VM virtual machine we saw earlier.

04

VM code and $_ts variable access

Obtaining the VM code and the $_ts variable is the first step, similar to the 4th generation, copy the code of the external chain JS (eg fjtvkgf7LVI2.a670748.js ) and the self-executing code of the 412 page to the file, directly directly Just run it, you need to make up the environment lightly, make up for what is missing, just make up for the window, location, document roughly, the specific content of the supplement can be used directly in the browser console copy() command to copy it, Then we can get the VM code directly by Hook eval, here $_ts The acquisition of variables is a bit different from the 4th generation, our approach is to directly take the 4th generation after running the code window.$_ts That's it, after the 5th generation runs the code, there will be an operation of clearing $_ts , you can look at the logic with the stack yourself, either delete the clearing logic, or define a global variable, and then directly in the call where the value of $_ts is exported:

05

The approximate complementary environment code is as follows:

 var eval_js = ""
var rs_ts = ""

window = {
    $_ts: {},
    eval: function (data) {
        eval_js = data
    }
}

location = {
    "ancestorOrigins": {},
    "href": "https://脱敏处理/datasearch/home-index.html",
    "origin": "https://脱敏处理",
    "protocol": "https:",
    "host": "www.脱敏处理.cn",
    "hostname": "www.脱敏处理.cn",
    "port": "",
    "pathname": "/datasearch/home-index.html",
    "search": "",
    "hash": ""
}

document = {
    "scripts": ["script", "script"]
}

Get the VM code and the $_ts variable:

06

Take Advantage of Watch Tracking

Before following the stack analysis, it is necessary to understand the Watch function of the browser developer tool, which can continuously track the value of a variable. For the case where there are many control flows such as Ruixi, setting the corresponding variable tracking can let you know. Which control flow you are in now, and the changes in the generated array, will not be followed by not knowing where to go. As shown in the figure below, _$S8 represents the 279th large control flow, _$5x represents which branch of the large control flow, _$mz represents a 128-bit large array .

07

Follow stack analysis

As usual, replace a set of 412 pages of code locally, fix it, and then start to analyze the stack. Start directly from (772, 1) (the number of control flow and the number of steps mentioned in the text are the author's own name, the number of steps does not represent the actual steps, only the key steps):

08

Step by step, _$qh is the incoming parameter 1, which is about to enter the control flow of No. 742:

09

Entering the control flow No. 742, the first step is to obtain a timestamp through a method, enter this method, and perform a difference calculation on the timestamp, and you will find that there are two variables _$tb and _$t1 Values have been generated:

10

11

These two values are also timestamps, how come? Search these two variables directly, and there are several search results with breakpoints all marked. After refreshing the break, follow the stack forward, and you will find that the control flow No. 703 was first walked:

12

First, step through the control flow No. 703. The first step of the control flow No. 703 is to enter the control flow No. 699 and return an array. There is nothing special, just deduct the code directly:

13

Steps 2 and 3 of the control flow of No. 703 take the values of the array respectively:

14

15

Steps 4, 5, and 6 of control flow No. 703 generate two timestamps and assign them to the aforementioned _$tb and _$t1 variables. The methods involved are nothing special. What is missing can be searched and supplemented:

16

17

18

Step 7 of control flow No. 703, a value of $_ts is modified here (in the VM code, $_ts is assigned to another variable, the figure below is _$iw ), _$iw._$uq The original value is _$ou , and the modified value is 181, which is also one of the key 4-bit arrays in the back. The specific logic will be discussed later.

19

Control flow No. 703 ends, we continue the previous control flow No. 742, step 2 of control flow No. 742, and assign the previously generated timestamp to another variable.

20

Step 3 of control flow No. 742, enter control flow No. 279. Control flow No. 279 is the key to generating 128-bit arrays.

21

Entering control flow 279, step 1 defines a variable:

22

Control flow No. 279, step 2, enter control flow No. 157, control flow No. 157 is mainly for automatic detection

23

24

Control flow No. 279, steps 3, 4, and 5, do some operations, and the values of some global variables will change, which will be used in subsequent arrays.

25

26

27

Control flow No. 279, step 6, initializes a 128-bit empty array, and subsequent operations are all to fill this array with values.

28

Control flow No. 279, step 7, enters control flow No. 695, generating a 20-bit array.

29

Enter the control flow No. 695 and take a look, the first step, take a value of $_ts to generate a 16-bit array.

30

Control flow No. 695, step 2, take the four values in $_ts , and form a 20-bit array together with the previous 16-bit array.

31

Note how these four values come from here. Taking the second value _$iw._$KI as an example, the search found a sentence _$iw._$KI = _$iw[_$iw._$KI](_$bl, _$n2); , first take the right side of the equal sign _$iw._$KI_$Mo_$iw["_$Mo"]实际上就是_$iw._$Mo bcce89a5ea593031674dbf9ad1030a4a--- ,前面的定义_$iw._$Mo = _$1D_$1D是个方法, so the original sentence is equivalent to _$iw._$KI = _$1D(_$bl, _$n2) , and the sources of the other three values are similar.

32

Control flow No. 695 ends, go back to control flow No. 279, step 8, convert the previous timestamp into an 8-bit array.

33

Control flow 279, step 9, adds a value to the 128-bit array.

34

_$ae How did this value come from? Search for the breakpoint and follow the stack, and find that it is obtained by taking the No. 178 control flow at the beginning, just follow it.

35

36

Control flow No. 279, in step 10, a value is added to the 128-bit array, which is the value that started control flow No. 279.

37

38

Control flow No. 279, steps 11, 12, 13, 14, timestamp correlation calculation, and then generate two 2-bit arrays. Note that the two variables here, _$ll and _$ed may have values when refreshing cookies and generating suffixes, and only accessing the home page has no value.

39

40

41

42

Control flow No. 279, step 15, adds a 4-bit array _$bl to the 128-bit array, and the search can also find that it is obtained through No. 723 control flow.

43

44

The control flow No. 723 here is actually taking $_ts a certain value for operation, generating a 16-bit array, and then intercepting the first 4-bit array and returning it.

45

46

Control flow 279, step 16, adds an 8-bit array _$Yb to the 128-bit array.

47

8-bit array _$Yb also search for breakpoints, which can be broken in an assignment statement:

48

It can be seen that the value of ---3ad4303b817be7588aa1ae1b2a85c774 _$EJ is _$Yb , and if you follow the stack forward, you will find that the control flow No. 657, No. 10, and No. 777 have passed successively, of which the control flow No. 777 is the entry:

49

50

51

If you follow the control flow of No. 777 in a single step, you will find that there are many steps, and some statements in the middle are difficult to handle and easy to be lost, so we can directly focus on the control flow of No. 657 here, and the control flow of No. 777 goes directly to the control of No. 10. Flow, and then to the control flow No. 657, some processes in the middle are temporarily ignored, and we will talk about it when there is something missing (there are many subsequent operations such as value assignment are implemented in the No. 777 control flow, you can pay attention), this The code for the local performance of the logic is shown in the following figure:

52

Here, we directly follow the control flow of No. 657 in a single step. Steps 1 and 2 create a new method.

53

54

Pay attention here, it is easy to be lost, first enter the _$bH method and hit a breakpoint, then the next breakpoint will go inside, and then in single-step debugging, it will enter another small control flow Inside, as shown below:

55

56

Begin single-stepping with little control flow no. 96, step 1 defines a variable.

57

96 号小控制流,第2 步_$PI _$fT_$PI的值其实是window.localStorage.$_YWTUwindow.localStorage There are many values in it. We will talk about this at the end of this article. Some of these values are related to browser fingerprints. Here, it is enough to know that he is a value.

58

Small control flow No. 96, step 3, enter the small control flow No. 94, and finally generate an 8-bit array, which is actually the value we want _$Yb .

59

There is nothing special in the back, I will omit the middle steps, just follow the deduction code, and then the 96th small control flow, the fourth step, assign the value of _$EJ to _$Yb now.

60

Don't rush to end here, there are still a few key steps to follow, small control flow No. 96, step 5, and encounter a similar writing to the previous one.

61

Similarly, advanced _$pu break the point, and then single-step follow.

62

Comes to another small control flow, as shown in the following figure:

63

Step 1 of No. 10 small control flow, take the value of window.localStorage.$_cDro , convert it to int type, assign it to _$5s , this _$5s will be added later inside a large array.

64

There are still a few steps to follow in the No. 10 small control flow, which is useless and can be omitted. The last step returns to the No. 96 small control flow.

65

Then the small control flow No. 96 has nothing to do, and the control flow No. 657 is returned.

66

At this point, we have already got _$Yb , the control flow No. 777 will be ignored for now, and there are still some codes that need not be deducted for now. We will talk about it when we use it, return to the control flow No. 279, and continue with the previous steps. , come to the 17th step, the variable _$5s after the control flow No. 264, a value is generated and added to the 128-bit large array, and the value of _$5s is exactly what we followed _$Yb , it is obtained through control flow No. 777, which is actually the value of window.localStorage.$_cDro and converted to int type.

67

Control flow No. 279, steps 18, 19, and 20, add two fixed values and an 8-bit array to the 128-bit array.

68

69

70

Control flow No. 279, step 21, adds a undefined placeholder to the 128-bit array, and there will be operations to fill it with the value later.

71

72

Control flow No. 279, step 22, enter control flow No. 58. Control flow No. 58 is related to the value of window.localStorage.$_fb . If there is this value, a 20-bit array will be generated, if not, it is undefined. The control flow of No. 58 has only one step, returning a variable, which is _$0g in this article.

73

74

Where did this _$0g come from? The same direct search, the breakpoint is set, and it is found that it is obtained through the control flow No. 112. Going forward to the stack, it also passes through the control flow No. 777 first. Similar to the previous situation, the intermediate process is not seen, and the direct Look at this 112 control flow.

75

In this article, the parameter of control flow No. 112 is _$bd[279] that is $_fb , the first step of control flow No. 112 is to enter control flow No. 247.

76

The control flow of No. 247 has three steps. First, assign window.localStorage to a variable, and then take the value of $_fb and return it.

77

78

79

Step 2 and 3 of No. 112 control flow, a try-catch statement, take window.localStorage.$_fb to calculate the 25-bit array, then take the first 20 bits and return, this is what we need before _$0g value.

80

81

Control flow No. 279, step 23, add the 20-bit array calculated in the previous window.localStorage.$_fb to the 128-bit large array. Note that if there is no window.localStorage.$_fb value in this step, it will not be Additional.

82

Control flow No. 279, step 24, perform bit operation on a variable, and then take window.localStorage.$_f0 for operation, if $_f0 is empty, no value will be added to the 128-bit large array of.

83

84

85

Control flow No. 279, step 25, perform bit operation on a variable, and then take window.localStorage.$_fh0 for operation, if $_fh0 is empty, it will not add a value to the 128-bit large array of.

86

87

88

Control flow No. 279, step 26, perform bit operation on a variable, and then take window.localStorage.$_f1 for operation, if $_f1 is empty, it will not add a value to the 128-bit large array of.

89

90

91

Control flow No. 279, step 27, enter control flow No. 611. Control flow No. 611 is mainly to detect window.navigator.connection.type , that is, NetworkInformation Network related information, which judges type是不是bluetoothcellularethernetwifi 、 ---58b771f4f181eb8b850365c32d30c449--- , wimax返回0。

92

93

94

Control flow No. 279, the next steps are similar, here is directly referred to as step 28, first perform bit operation on a variable, and then take window.localStorage.$_fr , window.localStorage.$_fpn1 , window.localStorage.$_vvCI , window.localStorage.$_JQnh to perform operations, and if these variables are empty, they will not add values to the 128-bit large array.

96

97

98

99

Control flow No. 279, step 29, adds a fixed value of 4 to the 128-bit large array. In this article, the variable name is _$kW .

100

_$kW this variable comes from, similar to the previous routine, direct search for the next break, which is also obtained through the 777 control flow at the beginning, as shown in the following figure:

101

Continue the control flow No. 279, and omit some variable bit operations in the middle. In steps 30 and 31, a length of https:443 is taken for calculation, and one is added to the 128-bit large array. fixed value and a 9-bit array.

102

103

For control flow No. 279, the next few steps are all taking values, which are almost the same, so they are collectively referred to as step 32.

104

105

106

107

108

109

Control flow No. 279, step 33, the 12th bit of the previous 128-bit large array is a undefined , here the 12th bit is filled with a 4-bit array, which has a variable _$8L , when we followed the steps before, there was a variable that has been doing bit operations, and here _$8L is how it came.

110

Control flow No. 279, the last two steps, the original 128-bit large array, only takes the first 21 bits with a value, how many bits are related to some values of window.localStorage , if there is a value, it will be longer, If not, make it shorter, then combine each element of the array into a final large array and return, and control flow 279 is over.

111

112

Returning to the logic at the beginning of the article, the control flow of No. 279 ends, and it returns to the control flow of No. 742. In step 2, a variable is defined and a 32-bit array is generated.

113

114

Control flow No. 742, step 3, take a value in $_ts and assign it to a variable.

115

Control flow No. 742, the fourth step, merge the large array obtained from the previous control flow No. 279 with a certain value in the previous step $_ts , and calculate a value after merging.

116

Control flow No. 742, step 4, further calculate the value obtained in the previous step to obtain a 4-bit array, and then combine it with the large array.

117

Control flow No. 742, the next few steps are to perform various operations on the timestamp, collectively referred to as step 5 here.

118

119

120

121

Control flow No. 742, step 6, calculate the 4 timestamps obtained in the previous step to obtain a 16-bit array.

122

Control flow No. 742, step 7, XOR the 16-bit array obtained in the previous step.

123

Control flow No. 742, step 8, calculate the 16-bit array of the previous step to get a string.

124

Control flow No. 742, the ninth step, the cookie value is officially generated, of which _$bd[274] is a fixed value, which is generally regarded as the version number. The string obtained in the previous step, the large array obtained before and a 32-bit array are processed Calculate, combine, and get the final result.

125

When the control flow of No. 742 ends, the control flow of No. 772 is returned, and a method is used to assemble the cookie, and then assign it to document.cookie , and the whole process is over.

126

127

128

The value of $_ts used in the code needs to be matched and replaced dynamically. These steps are similar to the 4th generation. This article will not repeat the description. You can refer to the article of Brother K's 4th generation. Just do it.

129

suffix generation

In this example, there is a sign parameter in the request header, and there are two suffix parameters in the Query String Parameters. These two suffixes are similar to the 4th generation, and they are all generated by Ruixu.

130

131

The same as the processing method of the 4th generation, our next XHR breakpoint, first let the web page load, then open the developer tools, after passing the infinite debugger, click the search will break, as shown in the following figure:

132

Going up to the stack to hasTokenGet is a confusion of jsjiami v6 under sojson, not worth mentioning, the point is jsonMD5ToStr method, first do some encoding processing on the passed parameters, The final return is hex_md5 , which is the same as the result of online MD5 encryption, indicating that it is standard MD5.

133

134

Focus on the generation of the two suffixes of Ruixi, the same as the 4th generation, XMLHttpRequest.send and XMLHttpRequest.open have been rewritten, as shown in the figure below, in XMLHttpRequest.open The next breakpoint, which is the _$RQ method in the figure, arguments[1] is the original URL, and the suffix can be obtained after the _$tB method in the figure.

135

Follow up the _$tB method in the figure, _$tB method nests some other methods, walk through the logic, to the _$5j method in the figure, the previous Part of it is processing the incoming URL.

136

Next is to generate a 16-bit array:

137

Then this 16-bit array generates the first suffix after a method, as shown in the figure below, the method in this article is _$ZO .

138

Follow-up _$ZO method, mainly has the following 5 steps:

Step 1: A 32-bit array is generated;

Step 2: Concatenate the previous 16-bit array and two variables to generate a 50-bit array;

Step 3: Enter the 744 control flow, here you will find that it is the same as the 742 control flow when we followed the cookie before, repeat it, so it is no longer followed here;

Step 4: Process the generated first suffix value to get a two-digit string, which will be used when obtaining the second suffix;

Step 5: Concatenate and return the first suffix name and value. At this time, the first suffix hKHnQfLv is generated.

139

Then the previous _$5j method, in the figure _$5j this step is to obtain the value of the second suffix 8X7Yi61c :

140

Mainly look at the _$UM method in the figure, first splicing the previously generated two-digit string with the URL parameters, and then going through a _$Nr method to get the second suffixed value.

141

Let's take a look at _$Nr method, first generate a value similar to 53924, and then a try statement. Note that there is a method here, the _$Js method in the figure, which uses $_ts A certain value inside, and a string composed of numbers is generated later, and the final value is obtained after combination and calculation again.

142

143

Going back to the previous _$UM method, the prefix 8X7Yi61c is combined with the value. Since then, both suffixes have been obtained:

144

fingerprint generation

As we have already analyzed, when adding values to a 128-bit array, there will be steps to calculate some values in window.localStorage . These values are generated by fingerprints such as browser canvas. Random can be concurrent, usually accessing a single html page does not verify the fingerprint, and the generated short cookie can pass, but some query data interfaces will verify the fingerprint, and add the fingerprint to the cookie by triggering the load event, making the cookie When the length becomes longer, how to find out where the fingerprint is generated, it is recommended to watch the video material directly. It has already been explained very clearly. The length is too long, so I won’t go into details in this article. Information link: https://mp.weixin.qq. com/s/DEUc1K8WaO_Cq1a2r0Ge5g


K哥爬虫
166 声望147 粉丝

Python网络爬虫、JS 逆向等相关技术研究与分享。