Go compilation principle series 10 (escape analysis)

foreword

In the previous article, I shared one of the compiler's optimization methods: function inlining . This article shares another optimization method of the compiler: escape analysis . Escape analysis is an important optimization stage in the Go language compilation process. It is mainly used to identify whether variables should be allocated on the stack or on the heap.

The content in the overview (including examples), in fact, you can see in the source code comments of escape analysis, escape analysis source code location: src/cmd/compile/internal/gc/escape.go (I feel that these parts of the source code are the most commented part of the whole, hahaha)

Escape Analysis Overview

First of all, we know that in C/C++, if a function returns an object pointer on the stack, after the function is executed and the stack is destroyed, it will continue to access the object pointer on the destroyed stack, there will be a problem

After this part introduces the escape analysis of the Go language compilation process, you will find that the escape analysis phase will identify whether a variable should be placed on the heap or on the stack. For variables placed on the heap, the garbage collection mechanism of the Go runtime will be used. Automatically free memory. Of course, the compiler will put the variables on the stack as much as possible, because the objects in the stack will be automatically destroyed when the function call ends, reducing the burden of runtime allocation and garbage collection

With escape analysis, in fact, as Go developers, when we define variables or objects, we may be allocated to the stack or the heap . such as objects created with new or make

When allocating, follow these two principles :

Pointers to objects on the stack cannot be stored on the heap (because the memory on the stack will be destroyed after use)
The pointer to the object on the stack cannot exceed the lifetime of the stack object (if it exceeds the lifetime of the stack object, it will be destroyed)

Below is an example of a simple escape

 package main

var a *int

func main()  {
    b := 1
    a = &b
}

In the example, a is a global integer pointer variable, and in the main function, variable a refers to the address of variable b. According to the two allocation principles we mentioned above, if b is allocated to the stack, it violates the second principle. Variable a exceeds the declaration period of variable b, so b needs to be allocated to the heap. You can view the escape information with the following command

 go tool compile -m xxx.go

The Go compilation process builds a directed graph with weights representing the current number of variable references and dereferences. As shown in the following example, p refers to the weight of q, when the weight is greater than 0, it means that there is a * dereference operation. When the weight is -1, it represents the existence & reference operation

 p = &q // -1
p = q //0
p = *q // 1
p = **q // 2
p = **&**&q //2

It is not necessary to escape if the weight is -1. For example, in the following example, although a refers to the address of variable b, since variable a does not exceed the declaration period of variable b, neither variable b nor variable a need to escape.

 func test() int {
    b := 666
    a := &b
    
    return *a
}

Below is an example to show the weighted directed graph of the decompiler

 package main

var o *int

func main()  {
    l := new(int)
    *l = 42
    m := &l
    n := &m
    o = **n
}

Finally, the data flow analysis of the compiler in the escape analysis will be parsed into the weighted directed graph shown in the figure below.

Among them, the node represents the variable, the edge represents the assignment between variables, the arrow represents the direction of the assignment, and the number on the edge represents the number of references or dereferences currently assigned. The weight of the node = the weight of the previous node + the number on the arrow , for example, the weight of node m is 2-1=1, and the weight of node l is 1-1=0

The purpose of traversing and calculating the directed weight graph is to find a node with a weight of -1 , such as the new(int) node in the above figure, its node variable address will be passed to the root node o, and escape analysis needs to be considered at this time. According to the allocation principle, the o node is a global variable and cannot be allocated in the stack. Therefore, the variable created by the new(int) node will be allocated to the heap

The actual scene will be more complicated, because a node may have multiple edges (such as structs), and there may be loops between nodes. The Go language uses the Bellman Ford algorithm (the algorithm for solving the shortest path from a single source) to traverse to find nodes with a weight less than 0 in a directed graph

The core code of escape analysis is located in: src/cmd/compile/internal/gc/escape.go. Below is a brief look at the source code of escape analysis

The underlying implementation of escape analysis

Also look down the entry file compiled by Go, you will see the following line of code

 // Phase 6: Escape analysis.
timings.Start("fe", "escapes")
escapes(xtop)

Called the escapes method to perform escape analysis, see below for the specific implementation of the escapes method

 func escapes(all []*Node) {
    visitBottomUp(all, escapeFuncs)
}

I found that there is only one method called visitBottomUp , is it familiar? That's right, this method was also called when the sharing function was inlined in the previous article. Its function is to traverse the abstract syntax tree through depth-first search, and verify each node, such as whether it is a closure or not. Then it is to execute the incoming method for the abstract syntax tree that satisfies the conditions. For escape analysis, it is actually to execute the function passed in the second parameter of visitBottomUp after checking escapeFuncs

Below we mainly look at the internal implementation of escapeFuncs

 func escapeFuncs(fns []*Node, recursive bool) {
    for _, fn := range fns {
        if fn.Op != ODCLFUNC {
            Fatalf("unexpected node: %v", fn)
        }
    }

    var e Escape
    e.heapLoc.escapes = true

    for _, fn := range fns {
        e.initFunc(fn)
    }
    for _, fn := range fns {
        e.walkFunc(fn)
    }
    e.curfn = nil

    e.walkAll()
    e.finish(fns)
}

There is very little code, mainly calling initFunc , walkFunc , walkAll , finish What have you done? For the specific implementation details, you can look at the source code by yourself.

initFunc : In fact, it is 从语法树构造数据流图 , the weighted directed graph mentioned earlier
walkFunc : traverse the AST to determine whether the corresponding node is OGOTO or OLABEL , and then label them with the corresponding label (for example, OGOTO)
walkAll : It mainly calculates the minimum dereference of each node in the weighted directed graph. Its implementation uses the Bellman Ford algorithm mentioned above (I don't know much about this algorithm. If you are interested, you can learn about it from Wikipedia, click here for details)
finish : 根据逃逸分析结果更新AST中对应节点的Esc fields etc.

Go compilation principle series 10 (escape analysis)

foreword

Escape Analysis Overview

The underlying implementation of escape analysis

书旅

引用和评论

70k star，取代Postman！这款轻量级API工具，太香了！

C++ 中 VS 项目引入公共配置文件

疯狂推荐！从零开始 Dify 部署全攻略！

Cherry Studio 入门 MCP：为你的大模型插上翅膀

狂揽17k star！Docker可视化神器，一键部署项目真香！

OpenWebUI：一站式 AI 应用构建平台体验

Spring 数据校验：@Validated 与@Valid 注解全面对比与应用