foreword

In the previous article, the variable capture part of compiler optimization was shared, and this article shared another content of compiler optimization - function inlining . Function inlining refers to placing smaller function contents directly into the caller function, thereby reducing the overhead of function calls

Function inlining overview

We know that the cost of every function call in a high-level programming language is related to the need to allocate stack memory for it to store parameters, return values, local variables, etc. The cost of Go's function calls lies in the copying of the parameter and return value stack , which is relatively small . The stack register overhead and the check stack expansion in the function preamble ( the stack in the Go language can be dynamically expanded , because Go allocates the stack memory not gradually, but at one time, in order to avoid access out of bounds, it will One-time allocation, when it is checked that the allocated stack memory is not enough, it will expand a sufficiently large stack space and copy the contents of the original stack )

Write a piece of code below to test the efficiency improvement brought by function inlining through the Go benchmark test

 import "testing"

//go:noinline //禁用内联。如果要开启内联,将该行注释去掉即可
func max(a, b int) int {
    if a > b {
        return a
    }

    return b
}

var Result int
func BenchmarkMax(b *testing.B)  {
    var r int
    for i:=0; i< b.N; i++ {
        r = max(-1, i)
    }
    Result = r
}


During the compilation process, the Go compiler actually calculates the cost of function inlining, so only simple functions will trigger function inlining. In the source code implementation of the function inlining later, we can see that the following cases will not be inlined :

  • recursive function
  • There are the following comments before the function: go:noinline , go:norace , go:nocheckptr , go:uintptrescapes
  • no function body
  • The number of nodes in the abstract syntax tree of the function declaration is greater than 5000 (my Go version is 1.16.6) (that is, if there are too many statements inside the function, it will not be inlined)
  • 函数中包含闭包( OCLOSURE )、range( ORANGE )、select( OSELECT )、go( OGO )、defer( ODEFER ), type ( ODCLTYPE ), return value is function ( ORETJMP ), will not be inlined

We can also control whether it can be inlined through parameters when building or compiling. If you want all functions in the program not to be inlined

 go build -gcflags="-l" xxx.go
go tool compile -l xxx.go

Similarly, at compile time, we can also see which functions are inlined, which functions are not inlined, and why

 go tool compile -m=2 xxx.go

see an example

 package main

func test1(a, b int) int {
    return a+b
}

func step(n int) int {
    if n < 2 {
        return n
    }
    return step(n-1) + step(n-2)
}

func main()  {
    test1(1, 2)
    step(5)
}

You can see that the function test1 can be inlined because its function body is very simple. Because the step function is a recursive function, it will not be inlined

Low-level implementation of function inlining

In fact, each function call chain here is very deep. I will not explain the meaning of the code line by line here. I will only introduce some core methods. Interested friends can debug it by themselves. Related articles) ( Go source code debugging method )

Or the Go compilation entry file mentioned many times earlier, you can find this code in the entry file

 Go编译入口文件:src/cmd/compile/main.go -> gc.Main(archInit)

// Phase 5: Inlining
if Debug.l != 0 {
        // 查找可以内联的函数
        visitBottomUp(xtop, func(list []*Node, recursive bool) {
            numfns := numNonClosures(list)
            for _, n := range list {
                if !recursive || numfns > 1 {
                    caninl(n)
                } else {
                    ......
                }
                inlcalls(n)
            }
        })
    }

    for _, n := range xtop {
        if n.Op == ODCLFUNC {
            devirtualize(n)
        }
    }

Let's take a look at what each method does

visitBottomUp

The method has two parameters:

  • xtop : I have seen it before, it stores the root node array of the abstract syntax tree of each declaration statement
  • The second parameter is a function (this function also has two parameters, one is the abstract syntax tree root node array that satisfies the function type declaration, the other is a bool value, true means it is a recursive function, false means it is not a recursive function)

Entering the visitBottomUp method, you will find that it mainly traverses xtop and calls the root node of each abstract syntax tree visit this method (only for abstract syntax trees declared by function types)

 func visitBottomUp(list []*Node, analyze func(list []*Node, recursive bool)) {
    var v bottomUpVisitor
    v.analyze = analyze
    v.nodeID = make(map[*Node]uint32)
    for _, n := range list {
        if n.Op == ODCLFUNC && !n.Func.IsHiddenClosure() { //是函数,并且不是闭包函数
            v.visit(n)
        }
    }
}

And the core of the ---e095fd19db9bcb8658e9433ce6241b8d visit method is to call the inspectList method, through inspectList to traverse the abstract syntax tree according to depth-first search, and use each node as inspectList The parameter of the second parameter of the method (which is a function), such as verifying whether there is a recursive call in this function, etc. (specifically, the switch case below)

 func (v *bottomUpVisitor) visit(n *Node) uint32 {
    if id := v.nodeID[n]; id > 0 {
        // already visited
        return id
    }

    ......
    v.stack = append(v.stack, n)

    inspectList(n.Nbody, func(n *Node) bool {
        switch n.Op {
        case ONAME:
            if n.Class() == PFUNC {
                ......
            }
        case ODOTMETH:
            fn := asNode(n.Type.Nname())
            ......
            }
        case OCALLPART:
            fn := asNode(callpartMethod(n).Type.Nname())
            ......
        case OCLOSURE:
            if m := v.visit(n.Func.Closure); m < min {
                min = m
            }
        }
        return true
    })
        v.analyze(block, recursive)
    }

    return min
}

Later, by calling the method passed by the second parameter of visitBottomUp , the inline judgment and inline operation are performed on the abstract syntax tree, specifically caninl and inlcalls these two methods

caninl

The role of this method is to verify whether the abstract syntax tree declared by the function type can be inlined

The implementation of this method is very simple. The first is to verify whether there is a mark like go:noinline in front of the function through many if statements.

 func caninl(fn *Node) {
    if fn.Op != ODCLFUNC {
        Fatalf("caninl %v", fn)
    }
    if fn.Func.Nname == nil {
        Fatalf("caninl no nname %+v", fn)
    }

    var reason string // reason, if any, that the function was not inlined
    ......

    // If marked "go:noinline", don't inline
    if fn.Func.Pragma&Noinline != 0 {
        reason = "marked go:noinline"
        return
    }

    // If marked "go:norace" and -race compilation, don't inline.
    if flag_race && fn.Func.Pragma&Norace != 0 {
        reason = "marked go:norace with -race compilation"
        return
    }

    ......

    // If fn has no body (is defined outside of Go), cannot inline it.
    if fn.Nbody.Len() == 0 {
        reason = "no function body"
        return
    }

    visitor := hairyVisitor{
        budget:        inlineMaxBudget,
        extraCallCost: cc,
        usedLocals:    make(map[*Node]bool),
    }
    if visitor.visitList(fn.Nbody) {
        reason = visitor.reason
        return
    }
    if visitor.budget < 0 {
        reason = fmt.Sprintf("function too complex: cost %d exceeds budget %d", inlineMaxBudget-visitor.budget, inlineMaxBudget)
        return
    }

    n.Func.Inl = &Inline{
        Cost: inlineMaxBudget - visitor.budget,
        Dcl:  inlcopylist(pruneUnusedAutos(n.Name.Defn.Func.Dcl, &visitor)),
        Body: inlcopylist(fn.Nbody.Slice()),
    }
    ......
}

Another main method here is visitList , which is used to verify whether there are the go, select, range, etc. statements we mentioned above in the function. For those satisfying the inline condition, it will rewrite the function declaration to free the inline field of the syntax tree ( Inl )

inlcalls

In this method, there are specific inline operations, such as converting the parameters and return values of the function into declaration statements in the caller. The calls and implementations inside are more complicated. The code is not sticky here. You can see it yourself. The core methods of function inlining are in the following files

 src/cmd/compile/internal/gc/inl.go

书旅
125 声望32 粉丝