7
头图

As a front-end classmate, whether you know what AST is or not, it does not affect your use of it in your work at all. less , babel , eslint , code compression, and JavaScript code used in our usual projects can be run in browsers, etc., all based on AST After learning about AST , you can also toss something out by yourself to make fun of the monotonous and boring work.

What is AST

AST(Abstract Syntax Tree) , called abstract syntax tree in Chinese, is an abstract representation of the source code syntax structure. It expresses the grammatical structure of a programming language in a tree-like form, and each node on the tree represents a structure in the source code. The reason why the grammar is "abstract" is that the grammar here does not show every detail that appears in the real grammar. For example, nested parentheses are implicit in the structure of the tree and are not presented in the form of nodes; and if-condition-then can be represented by nodes with three branches. (The above concept comes from Wikipedia ).

JavaScript AST conversion tool

For JavaScript , you can JS Parser the JS into code AST . At present, the more common JS Parser are as follows:

The examples in this article are all implemented esprima

How to convert code to AST

In the process of converting the code into AST , there are two important stages: Lexical Analysis and Syntax Analysis.

lexical analysis

Also called word segmentation, it is the process of converting a code in the form of a string into a sequence of tokens. The token here is a string, which is the smallest unit that constitutes the source code, similar to English words. Lexical analysis can also be understood as the process of combining English letters into words. The lexical analysis process does not care about the relationship between words. For example, in the lexical analysis process, the brackets can be marked as token , but the matching of the brackets is not checked.

JavaScript in token mainly includes the following:

Keywords: var, let, const, etc.

Identifier: consecutive characters not enclosed in quotation marks, which may be a variable, keywords such as if and else, or built-in constants such as true and false

Operators: +, -, *, / etc.

Numbers: like hexadecimal, decimal, octal and scientific expressions, etc.

String: the value of a variable, etc.

Spaces: consecutive spaces, line breaks, indentation, etc.

Comment: Line comment or block comment is a minimum grammatical unit that cannot be split

Punctuation: braces, parentheses, semicolons, colons, etc.

The following is const a = 'hello world' generated after esprima lexical analysis of tokens .

[
    {
        "type": "Keyword",
        "value": "const"
    },
    {
        "type": "Identifier",
        "value": "a"
    },
    {
        "type": "Punctuator",
        "value": "="
    },
    {
        "type": "String",
        "value": "'hello world'"
    }
]
Parsing

Also called a parser, it is the process of AST token according to a given formal grammar. That is, the process of combining words into sentences. During the conversion process, the grammar will be verified, and if the grammar is wrong, a grammatical error will be thrown.

After the above const a = 'hello world' is parsed, the generated AST as follows:

{
  "type": "Program",
  "body": [
    {
      "type": "VariableDeclaration",
      "declarations": [
        {
          "type": "VariableDeclarator",
          "id": {
            "type": "Identifier",
            "name": "a"
          },
          "init": {
            "type": "Literal",
            "value": "hello world",
            "raw": "'hello world'"
          }
        }
      ],
      "kind": "const"
    }
  ],
  "sourceType": "script"
}

After getting AST , we can analyze AST and do some of our own things on this basis. For example, the simplest way is to replace a certain variable in the code with another name.

practice

Below we will achieve a variable defined in the above code a replaced variable b . To achieve this requirement, we need to convert the source code into AST , and then perform some operations on this basis, change the content of the tree, and then AST into object code. That is to go through the process of parsing -> conversion -> generation.

First, we need to analyze the specific difference between AST AST generated by target code.
The following is the AST generated by const b = 'hello world'

{
  "type": "Program",
  "body": [
    {
      "type": "VariableDeclaration",
      "declarations": [
        {
          "type": "VariableDeclarator",
          "id": {
            "type": "Identifier",
            "name": "b" // 这里不同
          },
          "init": {
            "type": "Literal",
            "value": "hello world",
            "raw": "'hello world'"
          }
        }
      ],
      "kind": "const"
    }
  ],
  "sourceType": "script"
}

Through comparative analysis, we found that the only difference is type to Identifier of id of name property values are not the same. Then we can modify AST to meet our needs.

We need to install estraverse (traverse AST) and escodegen (generate JS according to AST) these two packages.

const esprima = require('esprima');
const estraverse = require('estraverse');
const escodegen = require('escodegen');

const program = "const a = 'hello world'";
const ASTree = esprima.parseScript(program);

estraverse.traverse(ASTree, {
    enter(node) {
        changeAToB(node);
    }
});

const ASTreeAfterChange = escodegen.generate(tree);
console.log(ASTreeAfterChange); // const b = 'hello world'

function changeAToB(node) {
    if (node.type === 'Identifier') {
        node.name = 'b';
    }
}

See, is it easy to achieve. After mastering the AST , we can do many things. Various babel plug-ins are also produced in this way, but the libraries used are different.

How to implement a babel plug-in can refer to the official Babel plug-in manual

Reference article

  1. [What you should know] Abstract Syntax Tree AST
  2. The transformation of mediocre front-end
  3. Babel plug-in manual

阳呀呀
2.2k 声望2.7k 粉丝