As a front-end classmate, whether you know what AST
is or not, it does not affect your use of it in your work at all. less
, babel
, eslint
, code compression, and JavaScript
code used in our usual projects can be run in browsers, etc., all based on AST
After learning about AST
, you can also toss something out by yourself to make fun of the monotonous and boring work.
What is AST
AST(Abstract Syntax Tree)
, called abstract syntax tree in Chinese, is an abstract representation of the source code syntax structure. It expresses the grammatical structure of a programming language in a tree-like form, and each node on the tree represents a structure in the source code. The reason why the grammar is "abstract" is that the grammar here does not show every detail that appears in the real grammar. For example, nested parentheses are implicit in the structure of the tree and are not presented in the form of nodes; and if-condition-then
can be represented by nodes with three branches. (The above concept comes from Wikipedia ).
JavaScript AST conversion tool
For JavaScript
, you can JS Parser
the JS
into code AST
. At present, the more common JS Parser
are as follows:
- esprima (popular library)
- Babylon (used in babel)
- acorn (used in webpack)
- espree (derived from acorn, used in eslint)
- astexplorer (online generation tool, optional different JS Parser for real-time viewing)
The examples in this article are all implemented esprima
How to convert code to AST
In the process of converting the code into AST
, there are two important stages: Lexical Analysis and
Syntax Analysis.
lexical analysis
Also called word segmentation, it is the process of converting a code in the form of a string into a sequence of tokens. The token
here is a string, which is the smallest unit that constitutes the source code, similar to English words. Lexical analysis can also be understood as the process of combining English letters into words. The lexical analysis process does not care about the relationship between words. For example, in the lexical analysis process, the brackets can be marked as token
, but the matching of the brackets is not checked.
JavaScript
in token
mainly includes the following:
Keywords: var, let, const, etc.
Identifier: consecutive characters not enclosed in quotation marks, which may be a variable, keywords such as if and else, or built-in constants such as true and false
Operators: +, -, *, / etc.
Numbers: like hexadecimal, decimal, octal and scientific expressions, etc.
String: the value of a variable, etc.
Spaces: consecutive spaces, line breaks, indentation, etc.
Comment: Line comment or block comment is a minimum grammatical unit that cannot be split
Punctuation: braces, parentheses, semicolons, colons, etc.
The following is const a = 'hello world'
generated after esprima
lexical analysis of tokens
.
[
{
"type": "Keyword",
"value": "const"
},
{
"type": "Identifier",
"value": "a"
},
{
"type": "Punctuator",
"value": "="
},
{
"type": "String",
"value": "'hello world'"
}
]
Parsing
Also called a parser, it is the process of AST
token
according to a given formal grammar. That is, the process of combining words into sentences. During the conversion process, the grammar will be verified, and if the grammar is wrong, a grammatical error will be thrown.
After the above const a = 'hello world'
is parsed, the generated AST
as follows:
{
"type": "Program",
"body": [
{
"type": "VariableDeclaration",
"declarations": [
{
"type": "VariableDeclarator",
"id": {
"type": "Identifier",
"name": "a"
},
"init": {
"type": "Literal",
"value": "hello world",
"raw": "'hello world'"
}
}
],
"kind": "const"
}
],
"sourceType": "script"
}
After getting AST
, we can analyze AST
and do some of our own things on this basis. For example, the simplest way is to replace a certain variable in the code with another name.
practice
Below we will achieve a variable defined in the above code a
replaced variable b
. To achieve this requirement, we need to convert the source code into AST
, and then perform some operations on this basis, change the content of the tree, and then AST
into object code. That is to go through the process of parsing -> conversion -> generation.
First, we need to analyze the specific difference between AST
AST
generated by target code.
The following is the AST generated by const b = 'hello world'
{
"type": "Program",
"body": [
{
"type": "VariableDeclaration",
"declarations": [
{
"type": "VariableDeclarator",
"id": {
"type": "Identifier",
"name": "b" // 这里不同
},
"init": {
"type": "Literal",
"value": "hello world",
"raw": "'hello world'"
}
}
],
"kind": "const"
}
],
"sourceType": "script"
}
Through comparative analysis, we found that the only difference is type
to Identifier
of id
of name
property values are not the same. Then we can modify AST
to meet our needs.
We need to install estraverse (traverse AST) and escodegen (generate JS according to AST) these two packages.
const esprima = require('esprima');
const estraverse = require('estraverse');
const escodegen = require('escodegen');
const program = "const a = 'hello world'";
const ASTree = esprima.parseScript(program);
estraverse.traverse(ASTree, {
enter(node) {
changeAToB(node);
}
});
const ASTreeAfterChange = escodegen.generate(tree);
console.log(ASTreeAfterChange); // const b = 'hello world'
function changeAToB(node) {
if (node.type === 'Identifier') {
node.name = 'b';
}
}
See, is it easy to achieve. After mastering the AST
, we can do many things. Various babel
plug-ins are also produced in this way, but the libraries used are different.
How to implement a babel
plug-in can refer to the official Babel plug-in manual
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。