1
search [160ee6f5b824dc brain into the fried fish ] follow this fried fish with fried liver. This article GitHub github.com/eddycjy/blog has been included, with my series of articles, materials and open source Go books.

Hello everyone, I am fried fish.

Defer is a very interesting keyword feature in Go language. Examples are as follows:

package main

import "fmt"

func main() {
    defer fmt.Println("煎鱼了")

    fmt.Println("脑子进")
}

The output is:

脑子进
煎鱼了

A few days ago, some friends in my reader group discussed the following question:

读者群的聊天截图

Simply put, the problem is for the for loop, will it cause any performance impact ?

Because defer is the data structure of the linked list in the design of the underlying data structure of the Go language:

defer 基本底层结构

Everyone is worried that if the cycle is too large, the defer list will be too long and not "strive for perfection". Or is it possible that the design of Go defer is similar to the Redis data structure design, and I have optimized it myself, but there is no big impact?

In today's article, we will explore the circular Go defer. Will the underlying linked list cause any problems if the underlying linked list is too long? If so, what are the specific effects?

Begin the road to sucking fish.

30% optimized defer performance

In the early days of Go1.13, a round of performance optimization was performed on defer, which improved the performance of defer by 30% in most scenarios:

Go defer 1.13 优化记录

Let's review the changes of Go1.13 and see where Go defer has been optimized. This is the key point of the problem.

Before and now

In Go1.12 and before, the assembly code when calling Go defer is as follows:

    0x0070 00112 (main.go:6)    CALL    runtime.deferproc(SB)
    0x0075 00117 (main.go:6)    TESTL    AX, AX
    0x0077 00119 (main.go:6)    JNE    137
    0x0079 00121 (main.go:7)    XCHGL    AX, AX
    0x007a 00122 (main.go:7)    CALL    runtime.deferreturn(SB)
    0x007f 00127 (main.go:7)    MOVQ    56(SP), BP

In Go1.13 and later, the assembly code when calling Go defer is as follows:

    0x006e 00110 (main.go:4)    MOVQ    AX, (SP)
    0x0072 00114 (main.go:4)    CALL    runtime.deferprocStack(SB)
    0x0077 00119 (main.go:4)    TESTL    AX, AX
    0x0079 00121 (main.go:4)    JNE    139
    0x007b 00123 (main.go:7)    XCHGL    AX, AX
    0x007c 00124 (main.go:7)    CALL    runtime.deferreturn(SB)
    0x0081 00129 (main.go:7)    MOVQ    112(SP), BP

From an assembly point of view, it seems that the original runtime.deferproc has been changed to the method of calling runtime.deferprocStack . Is there any optimization?

We hold the question continue to watch.

defer minimum unit: _defer

Compared with the previous version, the smallest unit _defer structure of Go defer mainly adds the heap field:

type _defer struct {
    siz     int32
    siz     int32 // includes both arguments and results
    started bool
    heap    bool
    sp      uintptr // sp at time of defer
    pc      uintptr
    fn      *funcval
    ...

This field is used to identify whether the _defer is allocated on the heap or on the stack. The rest of the fields have not been clearly changed. Then we can focus on defer and see what has been done.

deferprocStack

func deferprocStack(d *_defer) {
    gp := getg()
    if gp.m.curg != gp {
        throw("defer on system stack")
    }
    
    d.started = false
    d.heap = false
    d.sp = getcallersp()
    d.pc = getcallerpc()

    *(*uintptr)(unsafe.Pointer(&d._panic)) = 0
    *(*uintptr)(unsafe.Pointer(&d.link)) = uintptr(unsafe.Pointer(gp._defer))
    *(*uintptr)(unsafe.Pointer(&gp._defer)) = uintptr(unsafe.Pointer(d))

    return0()
}

This block of code quite conventional, mainly gets called defer function of the function stack pointer parameters specific address passed into the function and PC (program counter), this piece of the foregoing "in-depth understanding of the defer Go" have been described in detail, here I won't repeat it.

Where is this deferprocStack

You can see that it sets d.heap to false , which means that the deferprocStack method is for _defer allocated on the stack.

deferproc

The question is, where does it deal with the application scenarios allocated to the heap?

func newdefer(siz int32) *_defer {
    ...
    d.heap = true
    d.link = gp._defer
    gp._defer = d
    return d
}

Where is the specific newdefer called, as follows:

func deferproc(siz int32, fn *funcval) { // arguments of fn follow fn
    ...
    sp := getcallersp()
    argp := uintptr(unsafe.Pointer(&fn)) + unsafe.Sizeof(fn)
    callerpc := getcallerpc()

    d := newdefer(siz)
    ...
}

It is very clear that the deferproc method called in the previous version is now used to correspond to the scene allocated to the heap.

summary

  • What is certain is that deferproc has not been removed, but the process has been optimized.
  • The Go compiler will choose to use the deferproc or deferprocStack method according to the application scenario. They are respectively for the use scenarios allocated on the heap and the stack.

Where is the optimization

The main optimization lies in the change of the stack allocation rules of its defer object. The measures are:
The compiler defer the for-loop iteration depth of 060ee6f5b82bcc.

// src/cmd/compile/internal/gc/esc.go
case ODEFER:
    if e.loopdepth == 1 { // top level
        n.Esc = EscNever // force stack allocation of defer record (see ssa.go)
        break
    }

If the Go compiler detects that the loop depth is 1, the result of the escape analysis will be set and will be allocated on the stack, otherwise it will be allocated on the heap.

// src/cmd/compile/internal/gc/ssa.go
case ODEFER:
    d := callDefer
    if n.Esc == EscNever {
        d = callDeferStack
    }
    s.call(n.Left, d)

In this way, the large amount of performance overhead caused by the frequent calls of systemstack , mallocgc and other methods before is avoided, and the effect of improving performance in most scenarios can be achieved.

Call defer in a loop

Back to the problem itself, after knowing the principle of defer optimization. Then "does the defer keyword in the loop cause any performance impact?"

The most direct impact is that about 30% of the performance optimization is directly eliminated, and because the posture is incorrect, the theoretical defer's existing overhead (the linked list becomes longer) also becomes larger and the performance deteriorates.

Therefore, we have to avoid the following two scenarios of code:

  • Explicit loop: In the outer layer of the defer keyword, there is an explicit loop call, for example: for-loop statement, etc.
  • Implicit loop: There is a logic similar to loop nesting when calling the defer keyword, for example: goto statement, etc.

Explicit loop

The first example is to use the defer keyword for

func main() {
    for i := 0; i <= 99; i++ {
        defer func() {
            fmt.Println("脑子进煎鱼了")
        }()
    }
}

This is also the most common mode. Many people like to write it like this when writing a crawler or when calling Goroutine.

This is an explicit call to the loop.

Implicit loop

The second example is the use of keywords goto

func main() {
    i := 1
food:
    defer func() {}()
    if i == 1 {
        i -= 1
        goto food
    }
}

This kind of writing is relatively rare, because the goto keyword is sometimes even listed as a code specification and not used, mainly because it will cause some abuse, so most choose the actual way to implement the logic.

This is an implicit call, resulting in the function of the class loop.

to sum up

Obviously, Defer did not say it was particularly wonderful in its design. He is mainly optimized according to some actual application scenarios and achieves better performance.

Although defer itself will bring a little overhead, it is not as unusable as imagined. Unless the code you defer is in is the code that needs to be executed frequently, you need to consider optimization.

Otherwise, there is no need to be overly entangled. In fact, when guessing or encountering performance problems, look at the analysis of PProf to see if defer is in the corresponding hot path, and then perform reasonable optimization.

The so-called optimization may just remove defer and use manual execution, which is not complicated. When coding, avoid stepping on the two minefields of defer's explicit and implicit loops to maximize performance.

If you have any questions please comment and feedback exchange area, best relationship is mutual achievement , everybody thumbs is fried fish maximum power of creation, thanks for the support.

Article continuously updated, can be found [micro letter head into the fried fish] read, reply [ 000 ] have an interview algorithm solution to a problem-tier manufacturers and materials prepared for me; this article GitHub github.com/eddycjy/blog been included , Welcome Star to urge you to update.

煎鱼
8.4k 声望12.8k 粉丝