2
头图

foreword

The last time I used Antlr to refactor the first version of Refactoring the script interpreter with Antlr, I started to add other functions, which is now seen to support scope and function calls.

 int b= 10;
int foo(int age){
    for(int i=0;i<10;i++){
        age++;
    }
    return b+age;
}
int add(int a,int b) {
    int e = foo(10);
    e = e+10;
    return a+b+3+e;
}
add(2,20);
// Output:65

Most of the entire grammar rules refer to Java, and currently support:

  • Function declaration and invocation.
  • The push and pop of function calls ensure that function local variables are destroyed when the function exits.
  • Scope support, inner scope can access variables of outer scope.
  • Basic expression statements like i++, !=,==

The focus and difficulty of this implementation are scope and function calls. After the implementation, it can satisfy my curiosity, but before talking about scope and function calls, let’s take a look at how a simple variable declaration and access statement are implemented. , so that the subsequent understanding will be easier.

variable declaration

 int a=10;
a;
Since the built-in function has not been implemented, such as the console output function print(), the data can also be obtained by directly accessing the variable here.

The result after running is as follows:

First look at the syntax of the variable declaration statement:

 variableDeclarators
    : typeType variableDeclarator (',' variableDeclarator)*
    ;

variableDeclarator
    : variableDeclaratorId ('=' variableInitializer)?
    ;
typeList
    : typeType (',' typeType)*
    ;
typeType
    : (functionType | primitiveType) ('[' ']')*
    ;
primitiveType
    : INT
    | STRING
    | FLOAT
    | BOOLEAN
    ;

Just looking at the syntax is not intuitive, just look at the generated AST tree and you will understand:

During compilation, the tree on the left BlockVardeclar int a=10; and the tree on the right blockStm corresponds to the variable access a .

The running process of the entire program is divided into compile time and run time, and the corresponding processes are as follows:

  • Traverse the AST tree, do semantic analysis, generate the corresponding symbol table, type table, reference resolution, and some syntax checks, such as variable names, whether function names are repeated, whether private variables can be accessed, etc.
  • Runtime: Obtain data from the symbol table and type table generated during compilation, and execute specific code logic.

Visit AST

For the compilation time and runtime mentioned just now, there are actually two ways to access AST , which are also two ways provided by Antlr .

Listener mode

The first is the Listener mode, which can be guessed by the name; we need to implement the interfaces provided by Antlr, which correspond to different nodes in the AST tree.

Then Antlr will automatically traverse the tree. When accessing and exiting a node, it will call back our custom method. These interfaces have no return value, so we need to store the data during the traversal process by ourselves.

This is very suitable for the compilation period mentioned above. The data generated during the traversal process will naturally be stored in containers such as symbol table and type table.


Taking this code as an example, we implement the entry and exit listeners of the program root node and the for loop node, and the logic will be executed when Antlr runs to these nodes.

https://github.com/crossoverJie/gscript/blob/main/resolver/type_scope_resolver.go

Visitor mode

Visitor mode is exactly the same as Listener On the contrary, it is up to us to control which AST node we need to visit, and we need to return data after each visit, which is very suitable for program runtime .

With the data stored at compile time, various features can be implemented.

Take the above figure as an example, when accessing the Prog node, we can get the scope corresponding to the current node from the compilation period scope , and we can control the access to the next node VisitBlockStms by ourselves. Other nodes are of course possible, but usually we access them according to the structure defined in the grammar.

scope

Even if the AST generated by the same grammar is the same, different implementations when we traverse the AST will lead to different semantics, which is the difference between the semantic analysis of each language.

For example, Java does not allow to declare the same variable in the child scope as in the parent scope, but JavaScript does.

With the above foundation, let's take a look at how scope is implemented.

 int a=10;
a;

Or take this code as an example:

Here I simply draw the following process:

During compilation, a scope will be written for the current node, and a variable ---26d8d7786ac70de6546b2d430574b368 scope will be written in “a” .

The writing scope and writing variable here are divided into two Listeners. The specific code implementation is shown in the source code below.

the first time:
https://github.com/crossoverJie/gscript/blob/main/resolver/type_scope_resolver.go#L21

the second time:
https://github.com/crossoverJie/gscript/blob/main/resolver/type_resolver.go#L59

Then there is the runtime, from the data generated in the compilation period to get scope and the variables in it, there is a detail when getting the variables:
If you can't get it in the current scope, you need to try to get it from the parent scope , for example, in the following cases:

 int b= 10;
int foo(){
    return b;
}

The b here is not available in the current function scope, only in the parent scope .

The relationship of the parent scope is maintained when the scope is created. By default, the current scope is the parent of the scope at the time of writing.

The key code is as follows:

The fourth step to get the value of the variable is to access the literal node in the AST to get the value. The core code is as follows:

function

The core of the function call is to push all the data in the current function into the stack at runtime, and pop it out of the stack after the access is completed, so that the data of the function body class can be automatically released after the function exits.

The core code is as follows:

 int b= 10;
int foo(){
    return b;
}
int func(int a,int b) {
    int e = foo();
    return a+b+3+e;
}
func(2,20);

Even if there are functions like the above to call other functions, you don't have to worry about it. It is nothing more than writing data to the stack when the function body is executed. After the function exits, it will exit the stack frame in turn.

Sort of like the algorithm for matching parentheses {[()]} , which is essentially a recursive call.

Summarize

Due to the limited space, many of the details have not been discussed in detail. Interested friends can directly run the single test and try debugging.

https://github.com/crossoverJie/gscript/blob/main/compiler_test.go

The current version is still relatively rudimentary. For example, the basic type is only int, and there are no commonly used built-in functions.

The follow-up will gradually improve, such as adding:

  • Functions return multiple values.
  • custom type
  • Closure

Waiting for the characteristics, this pit will continue to be filled. I hope that at the end of the year, I can use gscript to write a web the server side is considered a milestone completed.

At this stage, a simple REPL tool has also been implemented. You can install and try it out:

Source address:
https://github.com/crossoverJie/gscript


crossoverJie
5.4k 声望4k 粉丝