foreword
In the previous article grammar analysis , we know how the Go compiler parses various declaration types (import, var, const, func, etc.) in the go text file according to the Go grammar. The syntax analysis phase parses the entire source file into a File
structure, and parses various declaration types in the source file into File.DeclList
. Finally, a syntax tree is generated with the File
structure as the root node and importDecl
, constDecl
, typeDecl
, varDecl
FuncDecl etc. as the child nodes
First of all, we need to make it clear that the function of the abstract syntax tree is actually for type checking, code style checking and so on. In short, with the abstract syntax tree, the compiler can accurately locate anywhere in the code, and perform some series of operations and verifications on it.
This article is the construction of the abstract syntax tree. We know that the source program must be built into an intermediate representation in the front-end of the compiler, so that it can be used in the back-end of the compiler. The abstract syntax tree is a common tree-like intermediate representation Form . So this article mainly introduces what the Go compiler does to build the syntax tree into an abstract syntax tree?
Abstract Syntax Tree Construction Overview
The following is an overall understanding of the abstract syntax tree construction process, which may span a relatively large span. The specific implementation details are introduced in the next part.
syntax parsing stage of the previous article, we know that the Go compiler will start multiple coroutines to parse each source file into a syntax tree. The location of the specific code is: src/cmd/compile/internal/gc/noder.go → parseFiles
func parseFiles(filenames []string) uint {
noders := make([]*noder, 0, len(filenames))
// Limit the number of simultaneously open files.
sem := make(chan struct{}, runtime.GOMAXPROCS(0)+10)
for _, filename := range filenames {
p := &noder{
basemap: make(map[*syntax.PosBase]*src.PosBase),
err: make(chan syntax.Error),
}
noders = append(noders, p)
//起多个协程对源文件进行语法解析
go func(filename string) {
sem <- struct{}{}
defer func() { <-sem }()
defer close(p.err)
base := syntax.NewFileBase(filename)
f, err := os.Open(filename)
if err != nil {
p.error(syntax.Error{Msg: err.Error()})
return
}
defer f.Close()
p.file, _ = syntax.Parse(base, f, p.error, p.pragma, syntax.CheckBranches) // errors are tracked via p.error
}(filename)
}
//开始将每一棵语法树构建成抽象语法树
var lines uint
for _, p := range noders {
for e := range p.err {
p.yyerrorpos(e.Pos, "%s", e.Msg)
}
p.node() //构建抽象语法树的核心实现
lines += p.file.Lines
p.file = nil // release memory
......
}
localpkg.Height = myheight
return lines
}
After parsing the source file into a syntax tree, the Go compiler builds each syntax tree (source file) into an abstract syntax tree. The core code is in the p.node() method:
func (p *noder) node() {
......
xtop = append(xtop, p.decls(p.file.DeclList)...)
......
clearImports()
}
The core part of the p.node() method is the p.decls(p.file.DeclList) method, which implements convert various declaration types in the source file into abstract syntax trees one by one, that is import, var, type, const, func declarations will become a root node , and the root node contains the currently declared child nodes
p.decls(p.file.DeclList)
is as follows:
func (p *noder) decls(decls []syntax.Decl) (l []*Node) {
var cs constState
for _, decl := range decls {
p.setlineno(decl)
switch decl := decl.(type) {
case *syntax.ImportDecl:
p.importDecl(decl)
case *syntax.VarDecl:
l = append(l, p.varDecl(decl)...)
case *syntax.ConstDecl:
l = append(l, p.constDecl(decl, &cs)...)
case *syntax.TypeDecl:
l = append(l, p.typeDecl(decl))
case *syntax.FuncDecl:
l = append(l, p.funcDecl(decl))
default:
panic("unhandled Decl")
}
}
return
}
On the whole, this method actually converts various declaration types in the syntax tree into an abstract syntax tree (Node structure) with various declarations as the root node, and finally the syntax tree becomes an array of nodes ( Node)
Below you can see what this Node structure looks like
type Node struct {
// Tree structure.
// Generic recursive walks should follow these fields.
//通用的递归遍历,应该遵循这些字段
Left *Node //左子节点
Right *Node //右子节点
Ninit Nodes
Nbody Nodes
List Nodes //左子树
Rlist Nodes //右子树
// most nodes
Type *types.Type //节点类型
Orig *Node // original form, for printing, and tracking copies of ONAMEs
// func
Func *Func //方法
// ONAME, OTYPE, OPACK, OLABEL, some OLITERAL
Name *Name //变量名、类型明、包名等等
Sym *types.Sym // various
E interface{} // Opt or Val, see methods below
// Various. Usually an offset into a struct. For example:
// - ONAME nodes that refer to local variables use it to identify their stack frame position.
// - ODOT, ODOTPTR, and ORESULT use it to indicate offset relative to their base address.
// - OSTRUCTKEY uses it to store the named field's offset.
// - Named OLITERALs use it to store their ambient iota value.
// - OINLMARK stores an index into the inlTree data structure.
// - OCLOSURE uses it to store ambient iota value, if any.
// Possibly still more uses. If you find any, document them.
Xoffset int64
Pos src.XPos
flags bitset32
Esc uint16 // EscXXX
Op Op //当前结点的属性
aux uint8
}
Knowing the meaning of several fields in the comments above is basically enough. The core is the Op field, which identifies the attributes of each node. You can see the definitions of all Ops in: src/cmd/compile/internal/gc/syntax.go, they all start with O, and they are all integers, each Op has its own semantics
const (
OXXX Op = iota
// names
ONAME // var or func name 遍历名或方法名
// Unnamed arg or return value: f(int, string) (int, error) { etc }
// Also used for a qualified package identifier that hasn't been resolved yet.
ONONAME
OTYPE // type name 变量类型
OPACK // import
OLITERAL // literal 标识符
// expressions
OADD // Left + Right 加法
OSUB // Left - Right 减法
OOR // Left | Right 或运算
OXOR // Left ^ Right
OADDSTR // +{List} (string addition, list elements are strings)
OADDR // &Left
......
// Left = Right or (if Colas=true) Left := Right
// If Colas, then Ninit includes a DCL node for Left.
OAS
// List = Rlist (x, y, z = a, b, c) or (if Colas=true) List := Rlist
// If Colas, then Ninit includes DCL nodes for List
OAS2
OAS2DOTTYPE // List = Right (x, ok = I.(int))
OAS2FUNC // List = Right (x, y = f())
......
)
For example, when the Op of a node is OAS, the semantics represented by the node is Left := Right. When the Op of the node is OAS2, the semantic representation is x, y, z = a, b, c
Suppose there is such a declaration statement: a := b + c(6), and the abstract syntax tree is constructed as follows
In the end, each declaration statement will be constructed into such an abstract syntax tree. The above is a general understanding of the abstract syntax tree, and the following is a detailed look at how various declaration statements are constructed step by step into an abstract syntax tree.
The parsing phase parses various declarations
In order to more intuitively see how the abstract syntax tree parses various declarations, we can directly use the methods in the standard library provided by go to debug. Because I did not intuitively see what a declaration looks like after it is parsed by the grammar, so I will show it through the methods in the standard library.
Basic face value analysis
The base denominations are integer , float , complex , , 161e24ddc34d7e character, string, identifier . From the grammar parsing of the previous Go, we know that the structure of the basic value in the Go compiler is
BasicLit struct {
Value string //值
Kind LitKind //那种类型的基础面值,范围(IntLit、FloatLit、ImagLit、RuneLit、StringLit)
Bad bool // true means the literal Value has syntax errors
expr
}
In the standard library, the structure of the base denomination looks like this
BasicLit struct {
ValuePos token.Pos // literal position
Kind token.Token // token.INT, token.FLOAT, token.IMAG, token.CHAR, or token.STRING
Value string // literal string; e.g. 42, 0x7f, 3.14, 1e-9, 2.4i, 'a', '\x7f', "foo" or `\m\n\o`
}
In fact, it is almost the same, including other various denomination structures or declared structures that we will mention later. The structures in the Go compiler are different from those in the Go standard library, but the meanings are similar.
Knowing the structure of the base face value, if we want to build a base face value, we can do this
func AstBasicLit() {
var basicLit = &ast.BasicLit{
Kind: token.INT,
Value: "666",
}
ast.Print(nil, basicLit)
}
//打印结果
*ast.BasicLit {
ValuePos: 0
Kind: INT
Value: "666"
}
The above is to directly construct a basic face value. In theory, we can construct a completed syntax tree in this way, but the manual method is too troublesome after all. So the standard library provides methods to automatically build syntax trees. Suppose I want to build the integer 666 into the structure of the base denomination
func AstBasicLitCreat() {
expr, _ := parser.ParseExpr(`666`)
ast.Print(nil, expr)
}
//打印结果
*ast.BasicLit {
ValuePos: 1
Kind: INT
Value: "666"
}
Another example is the identifier face value, its structure is:
type Ident struct {
NamePos token.Pos // 位置
Name string // 标识符名字
Obj *Object // 标识符类型或扩展信息
}
An identifier type can be constructed by the following method
func AstInent() {
ast.Print(nil, ast.NewIdent(`a`))
}
//打印结果
*ast.Ident {
NamePos: 0
Name: "a"
}
If the identifier appears in an expression, additional information about the identifier is stored in the Obj field
func AstInent() {
expr, _ := parser.ParseExpr(`a`)
ast.Print(nil, expr)
}
//打印结果
*ast.Ident {
NamePos: 1
Name: "a"
Obj: *ast.Object {
Kind: bad
Name: ""
}
}
Kind in the ast.Object structure is the type that describes the identifier
const (
Bad ObjKind = iota // for error handling
Pkg // package
Con // constant
Typ // type
Var // variable
Fun // function or method
Lbl // label
)
Expression parsing
In the go/ast/ast.go of the standard library, you will see the structure of various types of expressions, I will take a look here
// A SelectorExpr node represents an expression followed by a selector.
SelectorExpr struct {
X Expr // expression
Sel *Ident // field selector
}
// An IndexExpr node represents an expression followed by an index.
IndexExpr struct {
X Expr // expression
Lbrack token.Pos // position of "["
Index Expr // index expression
Rbrack token.Pos // position of "]"
}
// A SliceExpr node represents an expression followed by slice indices.
SliceExpr struct {
X Expr // expression
Lbrack token.Pos // position of "["
Low Expr // begin of slice range; or nil
High Expr // end of slice range; or nil
Max Expr // maximum capacity of slice; or nil
Slice3 bool // true if 3-index slice (2 colons present)
Rbrack token.Pos // position of "]"
}
In the Go compiler, you can also see a similar expression structure at: src/cmd/compile/internal/gc/noder.go
// X.Sel
SelectorExpr struct {
X Expr
Sel *Name
expr
}
// X[Index]
IndexExpr struct {
X Expr
Index Expr
expr
}
// X[Index[0] : Index[1] : Index[2]]
SliceExpr struct {
X Expr
Index [3]Expr
// Full indicates whether this is a simple or full slice expression.
// In a valid AST, this is equivalent to Index[2] != nil.
// TODO(mdempsky): This is only needed to report the "3-index
// slice of string" error when Index[2] is missing.
Full bool
expr
}
Although the definition of the structure is different, the meaning of the expression is similar. There are many methods for parsing various expressions in the standard library
type BadExpr struct{ ... }
type BinaryExpr struct{ ... }
type CallExpr struct{ ... }
type Expr interface{ ... }
type ExprStmt struct{ ... }
type IndexExpr struct{ ... }
type KeyValueExpr struct{ ... }
......
In the Go compiler, the core method of parsing expressions is: src/cmd/compile/internal/gc/noder.go→ expr()
func (p *noder) expr(expr syntax.Expr) *Node {
p.setlineno(expr)
switch expr := expr.(type) {
case nil, *syntax.BadExpr:
return nil
case *syntax.Name:
return p.mkname(expr)
case *syntax.BasicLit:
n := nodlit(p.basicLit(expr))
n.SetDiag(expr.Bad) // avoid follow-on errors if there was a syntax error
return n
case *syntax.CompositeLit:
n := p.nod(expr, OCOMPLIT, nil, nil)
if expr.Type != nil {
n.Right = p.expr(expr.Type)
}
l := p.exprs(expr.ElemList)
for i, e := range l {
l[i] = p.wrapname(expr.ElemList[i], e)
}
n.List.Set(l)
lineno = p.makeXPos(expr.Rbrace)
return n
case *syntax.KeyValueExpr:
// use position of expr.Key rather than of expr (which has position of ':')
return p.nod(expr.Key, OKEY, p.expr(expr.Key), p.wrapname(expr.Value, p.expr(expr.Value)))
case *syntax.FuncLit:
return p.funcLit(expr)
case *syntax.ParenExpr:
return p.nod(expr, OPAREN, p.expr(expr.X), nil)
case *syntax.SelectorExpr:
// parser.new_dotname
obj := p.expr(expr.X)
if obj.Op == OPACK {
obj.Name.SetUsed(true)
return importName(obj.Name.Pkg.Lookup(expr.Sel.Value))
}
n := nodSym(OXDOT, obj, p.name(expr.Sel))
n.Pos = p.pos(expr) // lineno may have been changed by p.expr(expr.X)
return n
case *syntax.IndexExpr:
return p.nod(expr, OINDEX, p.expr(expr.X), p.expr(expr.Index))
......
}
panic("unhandled Expr")
}
Let's still use the methods provided in the Go standard library to see what a binary expression looks like after it is parsed
func AstBasicExpr() {
expr, _ := parser.ParseExpr(`6+7*8`)
ast.Print(nil, expr)
}
The structure of the first binary expression is BinaryExpr
// A BinaryExpr node represents a binary expression.
BinaryExpr struct {
X Expr // left operand
OpPos token.Pos // position of Op
Op token.Token // operator
Y Expr // right operand
}
After being parsed into such a structure, different nodes can be created according to the type of Op. As mentioned earlier, each Op has its own semantics
expression evaluation
Suppose the binary expression above is to be evaluated
func AstBasicExpr() {
expr, _ := parser.ParseExpr(`6+7*8`)
fmt.Println(Eval(expr))
}
func Eval(exp ast.Expr) float64 {
switch exp := exp.(type) {
case *ast.BinaryExpr: //如果是二元表达式类型,调用EvalBinaryExpr进行解析
return EvalBinaryExpr(exp)
case *ast.BasicLit: //如果是基础面值类型
f, _ := strconv.ParseFloat(exp.Value, 64)
return f
}
return 0
}
func EvalBinaryExpr(exp *ast.BinaryExpr) float64 { //这里仅实现了+和*
switch exp.Op {
case token.ADD:
return Eval(exp.X) + Eval(exp.Y)
case token.MUL:
return Eval(exp.X) * Eval(exp.Y)
}
return 0
}
//打印结果
62
The main places are annotated, it should be easy to understand
Var declaration parsing
The first thing to note is that in the previous article Go grammar parsing , we know that the declaration of Var type will be parsed into the VarDecl structure. But in the Go standard library, the syntax parsing parses the Var, const, type, and import declarations into the GenDecl structure (called general declarations)
// token.IMPORT *ImportSpec
// token.CONST *ValueSpec
// token.TYPE *TypeSpec
// token.VAR *ValueSpec
//
GenDecl struct {
Doc *CommentGroup // associated documentation; or nil
TokPos token.Pos // position of Tok
Tok token.Token // IMPORT, CONST, TYPE, VAR
Lparen token.Pos // position of '(', if any
Specs []Spec
Rparen token.Pos // position of ')', if any
}
Which type of declaration can be distinguished by the Tok field
The following shows an example of what the Var declaration looks like after being parsed by the grammar
const srcVar = `package test
var a = 6+7*8
`
func AstVar() {
fset := token.NewFileSet()
f, err := parser.ParseFile(fset, "hello.go", srcVar, parser.AllErrors)
if err != nil {
log.Fatal(err)
}
for _, decl := range f.Decls {
if v, ok := decl.(*ast.GenDecl); ok {
fmt.Printf("Tok: %v\n", v.Tok)
for _, spec := range v.Specs {
ast.Print(nil, spec)
}
}
}
}
First of all, you can see that its Tok is Var, indicating that it is a declaration of Var type, and then its variable name is stored through the ast.ValueSpec structure, which can actually be understood as the VarDecl
structure in the Go compiler
At this point, you should have a general understanding of what the basic value, expressions, and var declarations look like after syntax parsing. As mentioned in the previous overview, the abstract syntax tree stage will convert the various declarations in the Go source file, into one by one abstract syntax tree , that is, import, var, type, const, func declarations will become a root node , below the root node contains the currently declared child nodes. Let's take the var declaration as an example to see how it is handled
Abstract Syntax Tree Construction
The idea of the construction process of the abstract syntax tree of each declaration is similar. The code inside is more complicated, so there is no line-by-line code to explain what they are doing. You can see it yourself: src/cmd/compile/internal/gc/noder .go → internal implementation of decls()
I only take the statement of Var declaration as an example to show how to deal with var declaration in the abstract syntax tree construction phase
Abstract Syntax Tree Construction for Var Declaration Statement
As mentioned earlier, the core logic of abstract syntax tree construction is: src/cmd/compile/internal/gc/noder.go → decls
, when the declaration type is * syntax.VarDecl, call the p.varDecl(decl) method to process
func (p *noder) decls(decls []syntax.Decl) (l []*Node) {
var cs constState
for _, decl := range decls {
p.setlineno(decl)
switch decl := decl.(type) {
......
case *syntax.VarDecl:
l = append(l, p.varDecl(decl)...)
......
default:
panic("unhandled Decl")
}
}
return
}
Look directly at the internal implementation of p.varDecl(decl)
func (p *noder) varDecl(decl *syntax.VarDecl) []*Node {
names := p.declNames(decl.NameList) //处理变量名
typ := p.typeExprOrNil(decl.Type) //处理变量类型
var exprs []*Node
if decl.Values != nil {
exprs = p.exprList(decl.Values) //处理值
}
......
return variter(names, typ, exprs)
}
I have shown several core methods called in this method. The method calls are relatively deep. I will show what is done in each method through the diagram below.
Let's first review what the structure that holds the var declaration looks like
// NameList Type
// NameList Type = Values
// NameList = Values
VarDecl struct {
Group *Group // nil means not part of a group
Pragma Pragma
NameList []*Name
Type Expr // nil means no type
Values Expr // nil means no values
decl
}
The core fields are NameList, Type, and Values. We can find that in the above processing method, three methods are called to process these three fields.
- names := p.declNames(decl.NameList), this method is convert all variable names into the corresponding Node structure , the fields of the Node structure have been introduced earlier, the core field inside is Op , This method assigns ONAME to the Op of each Name. So the method finally returns a Node array , which contains all the variable names declared by var
- p.typeExprOrNil(decl.Type), this method is convert a specific type into the corresponding Node structure (such as int, string, slice, etc., var a int ). This method is mainly implemented by calling the
expr(expr syntax.Expr) method. Its core function is to convert the specified type into the corresponding Node structure (there is a bunch of switch cases inside)
- p.exprList(decl.Values), this method is convert the value part into the corresponding Node structure , the core is to match the corresponding method according to the type of value for parsing
- variter(names, typ, exprs), which is actually a tree of Node or Node arrays that combine the variable name part, type part, value or expression part of the var declaration
The first three methods are to convert each part of the var declaration into the corresponding Node node. In fact, it is to set the Op attribute of this node, and each Op represents a semantic. Then the fourth method is to splicing these nodes into a tree according to the semantics, so that it can legally express the var declaration
The following is an example to show the abstract syntax tree construction of var declaration
Example showing abstract syntax tree for var declaration
Suppose there is an expression declared by var as below, I will first show the parsed look through the syntax parsing method provided in the standard library, and then I will show the result after building the result into an abstract syntax tree.
const srcVar = `package test
var a = 666+6
`
func AstVar() {
fset := token.NewFileSet()
f, err := parser.ParseFile(fset, "hello.go", srcVar, parser.AllErrors)
if err != nil {
log.Fatal(err)
}
for _, decl := range f.Decls {
if v, ok := decl.(*ast.GenDecl); ok {
fmt.Printf("Tok: %v\n", v.Tok)
for _, spec := range v.Specs {
ast.Print(nil, spec)
}
}
}
}
The above is the result after syntax analysis, and then the three methods mentioned above are called to convert Names, Type, and Values into Node structures as follows:
Names:
ONAME(a)
Values:
OLITERAL(666)
OADD(+)
OLITERAL(6)
Then build these Nodes into a tree through the variter(names, typ, exprs) method as follows:
You can view the parsing results of any code in the following ways:
const src = `你的代码`
func Parser() {
fset := token.NewFileSet() // positions are relative to fset
f, err := parser.ParseFile(fset, "", src, 0)
if err != nil {
panic(err)
}
// Print the AST.
ast.Print(fset, f)
}
Summarize
This article firstly understands what the abstract syntax tree does as a whole? and what does it do? What do the declarations in the source file look like after being constructed into an abstract syntax tree?
Then, through the syntax parsing method provided in the standard library, it shows how the basic value, expression, and Var declaration statement are parsed. Then, taking the Var declaration type as an example, it shows how to process the Var declaration statement in the construction stage of the abstract syntax tree. of
Whether in lexical analysis , grammatical analysis , the abstract syntax tree construction stage or the type checking to be shared later, etc., their implementation must have many details, which cannot be presented here one by one. This series of articles It can help friends to provide a clear outline, and you can follow this outline to see the details. For example, which var declarations are reasonable to use, and how import is written, you can see it in the underlying implementation of Go compilation
refer to
- "Compilation Principles"
- "Analysis of the underlying principles of the Go language"
- go-ast-book
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。