头图

foreword

The basic four arithmetic operations and the generation of AST are implemented in the script interpreter GScript implemented in the previous version.

When I am going to add another % operator for modulo, I will find that the work is very tedious and almost repetitive; mainly two steps:

  1. Need to add support for the % symbol to the lexer.
  2. Implement specific logic for the % token when the parser traverses the AST.

The lexical parsing and traversal of the AST are completely repetitive work, so can we simplify these two steps?

Antlr

Antlr is a common tool to help us solve these problems. With it, we only need to write a lexical file, and then we can automatically generate lexical and parser, and can generate codes in different languages.

Let's take the example of GScript to see how antlr helps us generate a lexer.

 func TestGScriptVisitor_Visit_Lexer(t *testing.T) {
    expression := "(2+3) * 2"
    input := antlr.NewInputStream(expression)
    lexer := parser.NewGScriptLexer(input)
    for {
        t := lexer.NextToken()
        if t.GetTokenType() == antlr.TokenEOF {
            break
        }
        fmt.Printf("%s (%q) %d\n",
            lexer.SymbolicNames[t.GetTokenType()], t.GetText(),t.GetColumn())
    }
}
 //output:
 ("(") 0
DECIMAL_LITERAL ("2") 1
PLUS ("+") 2
DECIMAL_LITERAL ("3") 3
 (")") 4
MULT ("*") 6
DECIMAL_LITERAL ("2") 8

Antlr will automatically parse our expression as token , and traverse the line of token and get the line where the token is located. , location, etc., is useful for syntax checking during compilation.

To achieve these, we only need to write lexical and grammar rule files.

The lexical and grammatical rules corresponding to the example just now are as follows:

 expr
    : '(' expr ')'                        #NestedExpr
    | liter=literal #Liter
    | lhs=expr bop=( MULT | DIV ) rhs=expr #MultDivExpr
    | lhs=expr bop=MOD rhs=expr            #ModExpr
    | lhs=expr bop=( PLUS | SUB ) rhs=expr #PlusSubExpr
    | expr bop=(LE | GE | GT | LT ) expr # GLe
    | expr bop=(EQUAL | NOTEQUAL) expr # EqualOrNot
    ;
DECIMAL_LITERAL:    ('0' | [1-9] (Digits? | '_'+ Digits)) [lL]?;
Full rules: https://github.com/crossoverJie/gscript/blob/main/GScript.g4

run:

 antlr -Dlanguage=Go -o parser -visitor -no-listener GScript.g4

It can help us generate the code of ---51a282e77bd73ef431dc7de723327cb4 Go (the default is Java ). For the lexical, grammar rules and installation steps of Antlr , please refer to the official website .

When we want to implement specific grammar logic, we only need to implement the relevant interface, Antlr will automatically traverse AST (of course, it can also be manually controlled), and at the same time access different AST Node will call back the interface we implement ourselves, so that we can write our own grammar rules.

Take the new modulo operation here as an example:

 func (v *GScriptVisitor) VisitModExpr(ctx *parser.ModExprContext) interface{} {
    lhs := v.Visit(ctx.GetLhs())
    rhs := v.Visit(ctx.GetRhs())
    return lhs.(int) % rhs.(int)
}

When the Antlr callback VisitModExpr method, the data on the left and right sides of the % symbol can be obtained, and only relevant operations are required at this time.

Based on this mode, a new statement is added this time. The specific syntax is as follows:

 func TestGScriptVisitor_VisitIfElse8(t *testing.T) {
    expression := `
if(3!=(1+2)){
    return 1+3
} else {
    return false
}`
    input := antlr.NewInputStream(expression)
    lexer := parser.NewGScriptLexer(input)
    stream := antlr.NewCommonTokenStream(lexer, 0)
    parser := parser.NewGScriptParser(stream)
    parser.BuildParseTrees = true
    tree := parser.Prog()
    visitor := GScriptVisitor{}
    var result = visitor.Visit(tree)
    fmt.Println(expression, " result:", result)
    assert.Equal(t, result, false)
}

Antlr also has various other advantages, such as being able to solve:

  • Left recursion.
  • Ambiguity.
  • priority.

And other issues.

It is also recommended to install the Antlr plug-in in the IDE, so that the AST syntax tree can be viewed intuitively, which can help us better debug the code.


upgrade xjson

With the help of GScript provided by statement , xjson also provides some interesting writing:

xjson的四则运算语法Antlr ,所以为了能支持GScript statement代码。

This also reflects the importance of Antlr such front-end tools, and the efficiency improvement is very obvious.

Summarize

With the help Antlr the follow-up GScript will continue to support function calls, a more complete type system, object-oriented and other features; interested friends, please continue to pay attention.

Source address:
https://github.com/crossoverJie/gscript

https://github.com/crossoverJie/xjson


crossoverJie
5.4k 声望4k 粉丝