6

what is AST

AST (Abstract Syntax Tree), Chinese Abstract Syntax Tree, referred to as Syntax Tree (Syntax Tree), is a tree representation of the abstract syntax structure of the source code. Each node on the tree represents a structure in the source code. Syntax trees are not unique to a certain programming language. Almost all programming languages such as JavaScript, Python, Java, and Golang have syntax trees.

When we got a toy as a child, we always liked to disassemble the toy into small parts, and then reassemble the parts according to our own ideas, and a new toy was born. And JavaScript is like an exquisite machine. Through AST parsing, we can also disassemble the toys in our childhood, deeply understand the various parts of the JavaScript machine, and then reassemble it according to our own wishes.

AST has a wide range of uses. IDE syntax highlighting, code inspection, formatting, compression, translation, etc., all need to convert the code into AST and then perform subsequent operations. The syntax difference between ES5 and ES6, for backward compatibility, in practice Syntax conversion is required in the application, and AST is also used. AST is not born for reverse, but after doing reverse, I learned AST, and it can be like a duck in water when deobfuscating.

AST has an online parsing website: https://astexplorer.net/ , at the top you can choose language, compiler, whether to enable conversion, etc. As shown in the figure below, area ① is the source code, area ② is the corresponding AST syntax tree, area ③ is the conversion code, which can perform various operations on the syntax tree, and area ④ is the new code generated after the conversion. The original Unicode characters in the figure become normal characters after manipulation.

There is no single format for the syntax tree. If you choose different languages and compilers, you will get different results. In JavaScript, the compilers include Acorn, Espree, Esprima, Recast, Uglify-JS, etc. Babel is the most used. , and the follow-up study also takes Babel as an example.

01

The position of the AST in the compilation

In the compilation principle, the compiler usually goes through three steps to convert code: Lexical Analysis, Syntax Analysis, and Code Generation. The following figure vividly shows this process:

02

lexical analysis

The lexical analysis stage is the first stage of the compilation process. The task of this stage is to read the source program character by character from left to right, then identify words according to the word formation rules, and generate a token stream, such as isPanda('🐼') will be split into isPanda , ( , '🐼' , ) four parts, each part has a different meaning, The lexical analysis process can be thought of as a list or array of tokens of different types.

03

Parsing

Parsing is a logical stage of the compilation process. The task of parsing is to combine word sequences into various grammatical phrases on the basis of lexical analysis, such as "program", "statement", "expression", etc. In the previous example , isPanda('🐼') will be parsed as an expression statement ExpressionStatement , 🐼 isPanda() will be parsed as a function expression CallExpression 🐼 will be parsed into a variable Literal etc. The dependencies and nesting relationships between many grammars constitute a tree structure, namely the AST syntax tree.

04

code generation

Code generation is the last step. Convert the AST syntax tree into executable code. Before the conversion, we can directly manipulate the syntax tree to perform operations such as additions, deletions, changes, and searches. For example, we can determine the declaration position of variables and change the value of variables. , delete some nodes, etc., we modify the statement isPanda('🐼') to a boolean type Literal : true , the syntax tree changes as follows:

05

Introduction to Babel

Babel is a JavaScript compiler, it can also be said to be a parsing library, Babel Chinese website: https://www.babeljs.cn/ , Babel English official website: https://babeljs.io/ , Babel has a lot of built-in analysis JavaScript code method, we can use Babel to convert the JavaScript code into an AST syntax tree, and then convert it into JavaScript code after adding, deleting, modifying, checking and other operations.

Babel contains a lot of various function packages, APIs, optional parameters of each method, etc. This article will not list them one by one. In the actual use process, you should check more official documents, or refer to some learning materials given at the end of the article. The installation of Babel is the same as other Node packages, whichever one needs to be installed, such as npm install @babel/core @babel/parser @babel/traverse @babel/generator

In reverse deobfuscation, the following functional packages of Babel are mainly used, and this article only introduces the following functional packages:

  1. @babel/core : the Babel compiler itself, which provides the babel compilation API;
  2. @babel/parser : Parse JavaScript code into AST syntax tree;
  3. @babel/traverse : Traverse and modify each node of the AST syntax tree;
  4. @babel/generator : restore AST to JavaScript code;
  5. @babel/types : Judge, verify the type of the node, build a new AST node, etc.

06

@babel/core

The Babel compiler itself is split into three modules: @babel/parser , @babel/traverse , @babel/generator , for example, the import effects of the following methods are the same:

 const parse = require("@babel/parser").parse;
const parse = require("@babel/core").parse;

const traverse = require("@babel/traverse").default
const traverse = require("@babel/core").traverse

@babel/parser

@babel/parser It can parse JavaScript code into AST syntax tree, which mainly provides two methods:

  • parser.parse(code, [{options}]) : parse a piece of JavaScript code;
  • parser.parseExpression(code, [{options}]) : Parses a single JavaScript expression with performance in mind.

Some optional parameters options :

parameter describe
allowImportExportEverywhere The default import and export declaration statements can only appear at the top level of the program, set to true can be declared anywhere
allowReturnOutsideFunction By default, if the return statement is used in the top-level, it will cause an error, and if it is set to true , no error will be reported.
sourceType The default is script , when the code contains keywords such as import , export , an error will be reported, which needs to be specified as module
errorRecovery By default, if babel finds some abnormal code, it will throw an error, set to true will continue parsing the code while saving the parsing error, and the error record will be saved in the errors attribute of the final generated AST Of course, if a serious error is encountered, the parsing will still be terminated

For a clearer example:

 const parser = require("@babel/parser");

const code = "const a = 1;";
const ast = parser.parse(code, {sourceType: "module"})
console.log(ast)

{sourceType: "module"} demonstrates how to add optional parameters, and the output is the AST syntax tree, which is the same as the syntax tree parsed by the online website https://astexplorer.net/ :

07

@babel/generator

@babel/generator can restore AST to JavaScript code, providing a generate method: generate(ast, [{options}], code) .

Some optional parameters options :

parameter describe
auxiliaryCommentBefore Add comment block text at the head of the output file content
auxiliaryCommentAfter Add comment block text at the end of the output file content
comments Whether the output contains comments
compact Whether the output content does not add spaces to avoid formatting
concise Whether to reduce whitespace to make the output more compact
minified Whether to compress the output code
retainLines Try using the same line numbers in the output code as in the source code

Then the previous example, the original code is const a = 1; , we now a variable as b , value 1 modify 2 , then restore the AST to generate new JS code:

 const parser = require("@babel/parser");
const generate = require("@babel/generator").default

const code = "const a = 1;";
const ast = parser.parse(code, {sourceType: "module"})
ast.program.body[0].declarations[0].id.name = "b"
ast.program.body[0].declarations[0].init.value = 2
const result = generate(ast, {minified: true})

console.log(result.code)

The final output is const b=2; , the variable name and value have been successfully changed, because of the compression processing, the spaces on the left and right sides of the equal sign are gone.

In the code {minified: true} shows how to add optional parameters, here represents the compressed output code, generate gets result gets an object, of which code property is the final JS code.

In the code ast.program.body[0].declarations[0].id.name is the position of a in the AST, ast.program.body[0].declarations[0].init.value is the position of 1 in the AST, as shown in the following figure:

08

@babel/traverse

When there are too many codes, it is impossible for us to locate and modify them one by one as before. For nodes of the same type, we can directly traverse all nodes to modify them. Here we use @babel/traverse , which is usually the same as visitor used together, visitor is an object, the name can be chosen arbitrarily, visitor can define some methods to filter nodes, here is an example to demonstrate:

 const parser = require("@babel/parser");
const generate = require("@babel/generator").default
const traverse = require("@babel/traverse").default

const code = `
const a = 1500;
const b = 60;
const c = "hi";
const d = 787;
const e = "1244";
`
const ast = parser.parse(code)

const visitor = {
    NumericLiteral(path){
        path.node.value = (path.node.value + 100) * 2
    },
    StringLiteral(path){
        path.node.value = "I Love JavaScript!"
    }
}

traverse(ast, visitor)
const result = generate(ast)
console.log(result.code)

The original code here defines five variables of abcde, whose values are both numbers and strings. We can see in the AST that the corresponding types are NumericLiteral and StringLiteral :

09

Then we declare a visitor object, and then define the processing method of the corresponding type, traverse receives two parameters, the first is the AST object, the second is visitor , when traverse traverses all nodes and encounters the node types NumericLiteral and StringLiteral , it will call the corresponding visitor The processing method, the method in visitor will receive a path object of the current node, the type of the object is NodePath , the object has a lot of attributes, the following Here are some of the most commonly used ones:

Attributes describe
toString() The source code of the current path
node node of the current path
parent the parent node of the current path
parentPath The parent path of the current path
type the type of the current path

PS: path In addition to many attributes, objects also have many methods, such as replacing nodes, deleting nodes, inserting nodes, finding parent nodes, getting sibling nodes, adding comments, judging node types, etc. Query related documents or view the source code when needed. The follow-up introduction @babel/types will give some examples to demonstrate, and there will be relevant examples in future practical articles. Due to the limited space, this article will not elaborate.

So in the above code, path.node.value gets the value of the variable, and then we can modify it further. After the above code is run, all numbers will be added by 100 and then multiplied by 2, and all strings will be replaced with I Love JavaScript! , the result is as follows:

 const a = 3200;
const b = 320;
const c = "I Love JavaScript!";
const d = 1774;
const e = "I Love JavaScript!";

If multiple types of nodes are handled the same way, you can also use | to concatenate all nodes into a string and apply the same method to all nodes:

 const visitor = {
    "NumericLiteral|StringLiteral"(path) {
        path.node.value = "I Love JavaScript!"
    }
}

visitor There are many ways to write the object, the effect of the following ways is the same:

 const visitor = {
    NumericLiteral(path){
        path.node.value = (path.node.value + 100) * 2
    },
    StringLiteral(path){
        path.node.value = "I Love JavaScript!"
    }
}
 const visitor = {
    NumericLiteral: function (path){
        path.node.value = (path.node.value + 100) * 2
    },
    StringLiteral: function (path){
        path.node.value = "I Love JavaScript!"
    }
}
 const visitor = {
    NumericLiteral: {
        enter(path) {
            path.node.value = (path.node.value + 100) * 2
        }
    },
    StringLiteral: {
        enter(path) {
            path.node.value = "I Love JavaScript!"
        }
    }
}
 const visitor = {
    enter(path) {
        if (path.node.type === "NumericLiteral") {
            path.node.value = (path.node.value + 100) * 2
        }
        if (path.node.type === "StringLiteral") {
            path.node.value = "I Love JavaScript!"
        }
    }
}

The enter method is used in the above writing methods. During the traversal of the node, the entry node (enter) and the exit (exit) node will visit the node once, traverse default is the entry node If you want to process the node when you exit the node, you must declare the visitor exit method in ---caf01fa3e3418b4ddf2bcae8c28204d2---.

@babel/types

@babel/types It is mainly used to build a new AST node. The previous example code is const a = 1; . If you want to add content, such as @babel/types const a = 1; const b = a * 5 + 1; , you can pass- @babel/types to achieve.

First observe the AST syntax tree, the original statement has only one VariableDeclaration node, and now one is added:

10

Then our idea is to traverse the VariableDeclaration node, add a VariableDeclaration node behind it, and generate a VariableDeclaration node, you can types.variableDeclaration() methods, the names of various methods in types are the same as what we see in the AST, but the first letter is lowercase, so we can roughly infer them without knowing all the methods. The method name, only knowing this method is not enough, you have to know what the incoming parameters are, you can check the documentation, but Brother K recommends looking at the source code directly, it is very clear, take Pycharm as an example, hold down the Ctrl key, and then click the method Name, go to the source code:

11

 function variableDeclaration(kind: "var" | "let" | "const", declarations: Array<BabelNodeVariableDeclarator>)

It can be seen that kind and declarations two parameters are required, of which declarations is a list of nodes composed of VariableDeclarator , so we can Write the following visitor part of the code, where path.insertAfter() means inserting a new node after the node:

 const visitor = {
    VariableDeclaration(path) {
        let declaration = types.variableDeclaration("const", [declarator])
        path.insertAfter(declaration)
    }
}

Next, we need to further define declarator , that is, a node of type VariableDeclarator , and query its source code as follows:

 function variableDeclarator(id: BabelNodeLVal, init?: BabelNodeExpression)

Observe the AST, the id is Identifier object, and the init is BinaryExpression object, as shown in the following figure:

12

To deal with the id first, you can use the types.identifier() method to generate it. The source code is function identifier(name: string) , and the name is b here. At this time, the code of visitor can be written like this :

 const visitor = {
    VariableDeclaration(path) {
        let declarator = types.variableDeclarator(types.identifier("b"), init)
        let declaration = types.variableDeclaration("const", [declarator])
        path.insertAfter(declaration)
    }
}

Then look at how to define init, first still look at the AST structure:

13

init is BinaryExpression objects, left left BinaryExpression , the right is the right NumericLiteral , can types.binaryExpression() ways to generate init, its source code is as follows :

 function binaryExpression(
    operator: "+" | "-" | "/" | "%" | "*" | "**" | "&" | "|" | ">>" | ">>>" | "<<" | "^" | "==" | "===" | "!=" | "!==" | "in" | "instanceof" | ">" | "<" | ">=" | "<=",
    left: BabelNodeExpression | BabelNodePrivateName, 
    right: BabelNodeExpression
)

At this point visitor the code can be written like this:

 const visitor = {
    VariableDeclaration(path) {
        let init = types.binaryExpression("+", left, right)
        let declarator = types.variableDeclarator(types.identifier("b"), init)
        let declaration = types.variableDeclaration("const", [declarator])
        path.insertAfter(declaration)
    }
}

Then continue to construct left and right, as in the previous method, observe the AST syntax tree, query the parameters that should be passed in the corresponding method, nest layer by layer, until all nodes are constructed, the final visitor The code should be like this:

 const visitor = {
    VariableDeclaration(path) {
        let left = types.binaryExpression("*", types.identifier("a"), types.numericLiteral(5))
        let right = types.numericLiteral(1)
        let init = types.binaryExpression("+", left, right)
        let declarator = types.variableDeclarator(types.identifier("b"), init)
        let declaration = types.variableDeclaration("const", [declarator])
        path.insertAfter(declaration)
        path.stop()
    }
}

Note: path.insertAfter() Add a sentence after the insert node statement path.stop() , which means that the traversal of the current node and subsequent child nodes will be stopped immediately after the insertion is completed, and the new node added is also VariableDeclaration , if the stop statement is not added, it will be inserted in an infinite loop.

After inserting a new node and converting it into JavaScript code, you can see an extra line of code, as shown in the following figure:

14

Common confusion reduction

After understanding AST and babel, you can restore the obfuscated JavaScript code. The following are some examples to further familiarize you with various operations of babel.

String restoration

The figure at the beginning of the article gives an example where normal characters are replaced with Unicode encoding:

 console['\u006c\u006f\u0067']('\u0048\u0065\u006c\u006c\u006f\u0020\u0077\u006f\u0072\u006c\u0064\u0021')

Observe the AST structure:

15

We found that the corresponding Unicode encoding is raw , and rawValue and value are normal, so we can raw replace rawValue or value can be, should be noted that the issue quotes, originally console["log"] , after you restore becomes console[log] , natural It will report an error. In addition to replacing the value, it is also possible to delete the extra node directly, or delete the raw value, so the following writing methods can restore the code:

 const parser = require("@babel/parser");
const generate = require("@babel/generator").default
const traverse = require("@babel/traverse").default

const code = `console['\u006c\u006f\u0067']('\u0048\u0065\u006c\u006c\u006f\u0020\u0077\u006f\u0072\u006c\u0064\u0021')`
const ast = parser.parse(code)

const visitor = {
    StringLiteral(path) {
        // 以下方法均可
        // path.node.extra.raw = path.node.rawValue
        // path.node.extra.raw = '"' + path.node.value + '"'
        // delete path.node.extra
        delete path.node.extra.raw
    }
}

traverse(ast, visitor)
const result = generate(ast)
console.log(result.code)

Restore result:

 console["log"]("Hello world!");

expression reduction

Brother K wrote the restoration of JSFuck confusion before, which introduced ![] can represent false, !![] or !+[] can represent true, in some obfuscated codes, often With these operations, to complicate simple expressions, it is often necessary to execute a statement to get the real result. The sample code is as follows:

 const a = !![]+!![]+!![];
const b = Math.floor(12.34 * 2.12)
const c = 10 >> 3 << 1
const d = String(21.3 + 14 * 1.32)
const e = parseInt("1.893" + "45.9088")
const f = parseFloat("23.2334" + "21.89112")
const g = 20 < 18 ? '未成年' : '成年'

To execute the statement, we need to understand the path.evaluate() method, which will perform operations on the path object, automatically calculate the result, and return an object, in which the confident attribute represents the confidence, value represents the calculation result, use the types.valueToNode() method to create a node, and use the path.replaceInline() method to replace the node with a new node generated by the calculation result. There are several replacement methods:

  • replaceWith : replace one node with another;
  • replaceWithMultiple : replace another node with multiple nodes;
  • replaceWithSourceString : Parse the incoming source code string into the corresponding Node and then replace it, the performance is poor, it is not recommended to use;
  • replaceInline : Replacing one or more nodes with another node is equivalent to having the functions of the first two functions at the same time.

The corresponding AST processing code is as follows:

 const parser = require("@babel/parser");
const generate = require("@babel/generator").default
const traverse = require("@babel/traverse").default
const types = require("@babel/types")

const code = `
const a = !![]+!![]+!![];
const b = Math.floor(12.34 * 2.12)
const c = 10 >> 3 << 1
const d = String(21.3 + 14 * 1.32)
const e = parseInt("1.893" + "45.9088")
const f = parseFloat("23.2334" + "21.89112")
const g = 20 < 18 ? '未成年' : '成年'
`
const ast = parser.parse(code)

const visitor = {
    "BinaryExpression|CallExpression|ConditionalExpression"(path) {
        const {confident, value} = path.evaluate()
        if (confident){
            path.replaceInline(types.valueToNode(value))
        }
    }
}

traverse(ast, visitor)
const result = generate(ast)
console.log(result.code)

Final result:

 const a = 3;
const b = 26;
const c = 2;
const d = "39.78";
const e = parseInt("1.89345.9088");
const f = parseFloat("23.233421.89112");
const g = "\u6210\u5E74";

delete unused variables

Sometimes there are some redundant variables in the code that are not used. Deleting these redundant variables helps to analyze the code more efficiently. The sample code is as follows:

 const a = 1;
const b = a * 2;
const c = 2;
const d = b + 1;
const e = 3;
console.log(d)

To delete redundant variables, you must first understand the functions of NodePath scope in scope are mainly to find the scope of identifiers, obtain and modify all references to identifiers Etc., the scope.getBinding() method is mainly used to delete unused variables. The incoming value is the identifier name that the current node can refer to. The returned key attributes are as follows:

  • identifier : the Node object of the identifier;
  • path : the NodePath object of the identifier;
  • constant : Whether the identifier is a constant;
  • referenced : Whether the identifier is quoted;
  • references : The number of times the identifier was quoted;
  • constantViolations : If the identifier is modified, all Path objects that modify the identifier node will be stored;
  • referencePaths : If the identifier is quoted, it will store all Path objects that refer to the identifier node.

So we can constantViolations , referenced , references , referencePaths whether multiple parameters to determine variables can be deleted, AST processing code is as follows :

 const parser = require("@babel/parser");
const generate = require("@babel/generator").default
const traverse = require("@babel/traverse").default

const code = `
const a = 1;
const b = a * 2;
const c = 2;
const d = b + 1;
const e = 3;
console.log(d)
`
const ast = parser.parse(code)

const visitor = {
    VariableDeclarator(path){
        const binding = path.scope.getBinding(path.node.id.name);

        // 如标识符被修改过,则不能进行删除动作。
        if (!binding || binding.constantViolations.length > 0) {
            return;
        }

        // 未被引用
        if (!binding.referenced) {
            path.remove();
        }

        // 被引用次数为0
        // if (binding.references === 0) {
        //     path.remove();
        // }

        // 长度为0,变量没有被引用过
        // if (binding.referencePaths.length === 0) {
        //     path.remove();
        // }
    }
}

traverse(ast, visitor)
const result = generate(ast)
console.log(result.code)

Processed code (unused b, c, e variables have been removed):

 const a = 1;
const b = a * 2;
const d = b + 1;
console.log(d);

Remove redundant logic code

Sometimes in order to increase the difficulty of reverse engineering, there will be a lot of nested if-else statements, and a lot of redundant logic code judged to be false, which can also be deleted by AST, leaving only the judged true, the sample code is as follows:

 const example = function () {
    let a;
    if (false) {
        a = 1;
    } else {
        if (1) {
            a = 2;
        }
        else {
            a = 3;
        }
    }
    return a;
};

Observe the AST, the judgment condition corresponds to the test node, the if corresponds to the consequent node, and the else corresponds to the alternate node, as shown in the following figure:

16

AST processing ideas and code:

  1. Filter out BooleanLiteral and NumericLiteral nodes, and take their corresponding values, namely path.node.test.value ;
  2. Judging that the value of value is true, then replace the node with the content under the consequent node, that is, path.node.consequent.body ;
  3. Judging that the value of value is false, then replace it with the content under the alternate node, that is, path.node.alternate.body ;
  4. Some if statements may not write else, and there is no alternate , so in this case, if the value of value is false, the node will be removed directly, that is, path.remove()
 const parser = require("@babel/parser");
const generate = require("@babel/generator").default
const traverse = require("@babel/traverse").default
const types = require('@babel/types');

const code = `
const example = function () {
    let a;
    if (false) {
        a = 1;
    } else {
        if (1) {
            a = 2;
        }
        else {
            a = 3;
        }
    }
    return a;
};
`
const ast = parser.parse(code)

const visitor = {
    enter(path) {
        if (types.isBooleanLiteral(path.node.test) || types.isNumericLiteral(path.node.test)) {
            if (path.node.test.value) {
                path.replaceInline(path.node.consequent.body);
            } else {
                if (path.node.alternate) {
                    path.replaceInline(path.node.alternate.body);
                } else {
                    path.remove()
                }
            }
        }
    }
}

traverse(ast, visitor)
const result = generate(ast)
console.log(result.code)

process result:

 const example = function () {
  let a;
  a = 2;
  return a;
};

switch-case anti-control flow flattening

Control flow flattening is the most common among obfuscations, through if-else or while-switch-case statement decomposition step, sample code:

 const _0x34e16a = '3,4,0,5,1,2'['split'](',');
let _0x2eff02 = 0x0;
while (!![]) {
    switch (_0x34e16a[_0x2eff02++]) {
        case'0':
            let _0x38cb15 = _0x4588f1 + _0x470e97;
            continue;
        case'1':
            let _0x1e0e5e = _0x37b9f3[_0x50cee0(0x2e0, 0x2e8, 0x2e1, 0x2e4)];
            continue;
        case'2':
            let _0x35d732 = [_0x388d4b(-0x134, -0x134, -0x139, -0x138)](_0x38cb15 >> _0x4588f1);
            continue;
        case'3':
            let _0x4588f1 = 0x1;
            continue;
        case'4':
            let _0x470e97 = 0x2;
            continue;
        case'5':
            let _0x37b9f3 = 0x5 || _0x38cb15;
            continue;
    }
    break;
}

AST restoration ideas:

  1. Obtain the original array of control flow, and convert the statement such as '3,4,0,5,1,2'['split'](',') ['3','4','0','5','1','2'] . After obtaining the array, you can also choose to delete the node corresponding to the split statement, because This statement in the final code is useless;
  2. Traverse the control flow array obtained in the first step, and take out the case node corresponding to each value in turn;
  3. Define an array to store the contents of each case node consequent in the array, and delete the node corresponding to the continue statement;
  4. After the traversal is complete, replace the entire while node with the array in the third step, which is WhileStatement .

There are different ideas and different ways of writing. For how to obtain the control flow array, you can have the following ideas:

  1. Get the While statement node, then use the path.getAllPrevSiblings() method to get all the sibling nodes in front of it, traverse each sibling node, and find the variable name in the array with switch() The same node, and then take the value of the node for subsequent processing;
  2. Directly take the variable name of the array in switch() , and then use the scope.getBinding() method to get the node it is bound to, and then take the value of this node for subsequent processing.

Therefore, there are two ways to write the AST processing code. Method 1: (code.js is the previous example code. For the convenience of operation, use fs to read the code from the file)

 const parser = require("@babel/parser");
const generate = require("@babel/generator").default
const traverse = require("@babel/traverse").default
const types = require("@babel/types")
const fs = require("fs");

const code = fs.readFileSync("code.js", {encoding: "utf-8"});
const ast = parser.parse(code)

const visitor = {
    WhileStatement(path) {
        // switch 节点
        let switchNode = path.node.body.body[0];
        // switch 语句内的控制流数组名,本例中是 _0x34e16a
        let arrayName = switchNode.discriminant.object.name;
        // 获得所有 while 前面的兄弟节点,本例中获取到的是声明两个变量的节点,即 const _0x34e16a 和 let _0x2eff02
        let prevSiblings = path.getAllPrevSiblings();
        // 定义缓存控制流数组
        let array = []
        // forEach 方法遍历所有节点
        prevSiblings.forEach(pervNode => {
            let {id, init} = pervNode.node.declarations[0];
            // 如果节点 id.name 与 switch 语句内的控制流数组名相同
            if (arrayName === id.name) {
                // 获取节点整个表达式的参数、分割方法、分隔符
                let object = init.callee.object.value;
                let property = init.callee.property.value;
                let argument = init.arguments[0].value;
                // 模拟执行 '3,4,0,5,1,2'['split'](',') 语句
                array = object[property](argument)
                // 也可以直接取参数进行分割,方法不通用,比如分隔符换成 | 就不行了
                // array = init.callee.object.value.split(',');
            }
            // 前面的兄弟节点就可以删除了
            pervNode.remove();
        });

        // 储存正确顺序的控制流语句
        let replace = [];
        // 遍历控制流数组,按正确顺序取 case 内容
        array.forEach(index => {
                let consequent = switchNode.cases[index].consequent;
                // 如果最后一个节点是 continue 语句,则删除 ContinueStatement 节点
                if (types.isContinueStatement(consequent[consequent.length - 1])) {
                    consequent.pop();
                }
                // concat 方法拼接多个数组,即正确顺序的 case 内容
                replace = replace.concat(consequent);
            }
        );
        // 替换整个 while 节点,两种方法都可以
        path.replaceWithMultiple(replace);
        // path.replaceInline(replace);
    }
}

traverse(ast, visitor)
const result = generate(ast)
console.log(result.code)

Method Two:

 const parser = require("@babel/parser");
const generate = require("@babel/generator").default
const traverse = require("@babel/traverse").default
const types = require("@babel/types")
const fs = require("fs");

const code = fs.readFileSync("code.js", {encoding: "utf-8"});
const ast = parser.parse(code)

const visitor = {
    WhileStatement(path) {
        // switch 节点
        let switchNode = path.node.body.body[0];
        // switch 语句内的控制流数组名,本例中是 _0x34e16a
        let arrayName = switchNode.discriminant.object.name;
        // 获取控制流数组绑定的节点
        let bindingArray = path.scope.getBinding(arrayName);
        // 获取节点整个表达式的参数、分割方法、分隔符
        let init = bindingArray.path.node.init;
        let object = init.callee.object.value;
        let property = init.callee.property.value;
        let argument = init.arguments[0].value;
        // 模拟执行 '3,4,0,5,1,2'['split'](',') 语句
        let array = object[property](argument)
        // 也可以直接取参数进行分割,方法不通用,比如分隔符换成 | 就不行了
        // let array = init.callee.object.value.split(',');

        // switch 语句内的控制流自增变量名,本例中是 _0x2eff02
        let autoIncrementName = switchNode.discriminant.property.argument.name;
        // 获取控制流自增变量名绑定的节点
        let bindingAutoIncrement = path.scope.getBinding(autoIncrementName);
        // 可选择的操作:删除控制流数组绑定的节点、自增变量名绑定的节点
        bindingArray.path.remove();
        bindingAutoIncrement.path.remove();

        // 储存正确顺序的控制流语句
        let replace = [];
        // 遍历控制流数组,按正确顺序取 case 内容
        array.forEach(index => {
                let consequent = switchNode.cases[index].consequent;
                // 如果最后一个节点是 continue 语句,则删除 ContinueStatement 节点
                if (types.isContinueStatement(consequent[consequent.length - 1])) {
                    consequent.pop();
                }
                // concat 方法拼接多个数组,即正确顺序的 case 内容
                replace = replace.concat(consequent);
            }
        );
        // 替换整个 while 节点,两种方法都可以
        path.replaceWithMultiple(replace);
        // path.replaceInline(replace);
    }
}

traverse(ast, visitor)
const result = generate(ast)
console.log(result.code)

After the above code is run, the original switch-case control flow is restored and becomes a line-by-line code, which is more concise and clear:

 let _0x4588f1 = 0x1;
let _0x470e97 = 0x2;
let _0x38cb15 = _0x4588f1 + _0x470e97;
let _0x37b9f3 = 0x5 || _0x38cb15;
let _0x1e0e5e = _0x37b9f3[_0x50cee0(0x2e0, 0x2e8, 0x2e1, 0x2e4)];
let _0x35d732 = [_0x388d4b(-0x134, -0x134, -0x139, -0x138)](_0x38cb15 >> _0x4588f1);

References

This article refers to the following materials, which are also recommended online learning materials:

END

There are not many domestic materials for the Babel compiler. Look at the source code and compare the visual AST syntax tree online at the same time. You can be patient and analyze it layer by layer. The cases in this article are only the most basic operations. It has to be modified according to the situation, for example, some type judgments need to be added to limit, etc. In the follow-up, Brother K will use actual combat to lead you to further familiarize yourself with other operations in deobfuscation.


K哥爬虫
166 声望148 粉丝

Python网络爬虫、JS 逆向等相关技术研究与分享。