Explain the WaitGroup design in Go

The coroutine goroutine provided by the Go language allows us to easily write multi-threaded programs, but how to effectively control these concurrently executing goroutines is a problem we need to explore. As the little chopper described in "Golang Concurrency Control Brief", in the synchronization primitives provided by the Go standard library, locks and atomic operations focus on controlling the data security between goroutines, and WaitGroup, channel and Context control their And issue as. The implementation principles of locks, atomic operations, and channels have all been analyzed in detail. Therefore, in this article, we will focus on WaitGroup.

First acquaintance with WaitGroup

WaitGroup is the content under the sync package, which is used to control the synchronization between coroutines. The usage scenario of WaitGroup has the same meaning as the name. When we need to wait for a group of coroutines to complete before performing subsequent processing, we can consider using it.

func main() {
    var wg sync.WaitGroup

    wg.Add(2) //worker number 2

    go func() {
        // worker 1 do something
        fmt.Println("goroutine 1 done！")
        wg.Done()
    }()

    go func() {
        // worker 2 do something
        fmt.Println("goroutine 2 done！")
        wg.Done()
    }()

    wg.Wait() // wait all waiter done
    fmt.Println("all work done！")
}

// output
goroutine 2 done！
goroutine 1 done！
all work done！

You can see that the use of WaitGroup is very simple, it provides three methods. Although there is no parent-child relationship between goroutines, in order to facilitate understanding, this article will call the goroutine that calls the Wait function as the main goroutine, and the goroutine that calls the Done function as the child goroutine.

func (wg *WaitGroup) Add(delta int)  // 增加WaitGroup中的子goroutine计数值
func (wg *WaitGroup) Done()          // 当子goroutine任务完成，将计数值减1
func (wg *WaitGroup) Wait()          // 阻塞调用此方法的goroutine，直到计数值为0

So how is it achieved? In the source code src/sync/waitgroup.go , we can see that its core source code is only less than 100 lines, which is very refined and worth learning.

Pre-knowledge

Less code does not mean that the implementation is simple and easy to understand. On the contrary, if the reader does not have the pre-knowledge in the following, it will be more laborious to truly understand the implementation of WaitGroup. Before parsing the source code, let's go through this knowledge first (if you already have it, you can skip directly to the source code parsing part later).

signal

When learning operating systems, we know that semaphores are a mechanism to protect shared resources and are used to solve the problem of multi-thread synchronization. The semaphore s is a global variable with a non-negative integer value, which can only be handled by two special operations. These two operations are called P and V .

P(s) : If s is non-zero, then P will s and return immediately. If s is zero, then suspend the thread until s becomes non-zero, and wait until another V(s) wakes up the thread. After waking up, the P operation s 1, and returns control to the caller.
V(s) : The V operation adds 1 s If any thread is blocked in the P operation waiting for s to become non-zero, then the V operation will wake up one of these threads, and then the thread will s 1 to complete its P operation.

In Go's underlying semaphore function

runtime_Semacquire(s *uint32) function will block the goroutine until s is greater than 0, and then atomically subtract this value, that is, the P operation.
runtime_Semrelease(s *uint32, lifo bool, skipframes int) function atomically increases the value of the semaphore, and then notifies runtime_Semacquire , that is, the V operation.

These two semaphore functions are not only used in WaitGroup. In the article "Go's exquisite mutex design", we found that Go also inevitably participates in semaphores when designing mutex locks.

Memory alignment

For the following structure, can you answer how much memory it occupies?

type Ins struct {
    x bool  // 1个字节
    y int32 // 4个字节
    z byte  // 1个字节
}

func main() {
    ins := Ins{}
    fmt.Printf("ins size: %d, align: %d\n", unsafe.Sizeof(ins), unsafe.Alignof(ins))
}

//output
ins size: 12, align: 4

According to the size of the field in the structure, the ins object should be 1+4+1=6 bytes, but it is actually 12 bytes, which is caused by memory alignment. From the article "The Impact of CPU Cache System on Go Programs", we know that CPU memory reads are not read byte by byte, but block by block. Therefore, when the value of the type is aligned in the memory, the loading or writing of the computer will be very efficient.

The length of the memory occupied by the aggregate type (structure or array) may be greater than the sum of the memory occupied by its elements. The compiler will add unused memory addresses to fill the memory gaps to ensure that consecutive members or elements are aligned with the starting address of the structure or array.

Therefore, when we design the structure, when the structure members are of different types, defining the members of the same type in adjacent positions can save more memory space.

Atomic operation CAS

CAS is a kind of atomic operation, which can be used to realize data exchange operations in multi-threaded programming, so that 16138d70a9caf8 avoids the unpredictable execution sequence and the unpredictable interruption when multiple threads simultaneously rewrite a certain data Data inconsistency problem . This operation compares the value in the memory with the specified data, and replaces the data in the memory with the new value when the value is the same. Regarding the low-level implementation of atomic operations in Go, the small kitchen knife has a detailed introduction in the article "The Cornerstone of Synchronization Primitives".

Shift operations >> and <<

In the previous articles about locks "Go's subtle mutex lock design" and "Go's more fine-grained read-write lock design", we can see a large number of bit operations. Flexible bit operations can make an ordinary number change with rich meanings. Here we only introduce the shift operations that will be used in the following.

For the left shift operation <<, all numbers are moved to the left by the corresponding number of bits in binary form, the high bits are discarded, and the low bits are filled with zeros. On the premise that the number does not overflow, shifting by one bit to the left is equivalent to multiplying by the 1st power of 2, and shifting by n bits to the left is equivalent to multiplying by the nth power of 2.

For the right shift operation>>, all numbers are shifted to the right by the corresponding number of bits in binary form, the low bit is shifted out, and the high bit space is filled with the sign bit. Shifting by one bit to the right is equivalent to dividing by 2, and shifting by n bits to the right is equivalent to dividing by 2 to the power of n. The quotient is taken here, and the remainder is not needed.

Shift operations can also have very clever operations, and we will see advanced applications of shift operations later.

unsafa.Pointer pointer and uintptr

Pointers in Go can be divided into three categories: 1. Common type pointer *T, such as *int; 2. unsafe.Pointer pointer; 3. uintptr.

*T: Common pointer type, used to transfer object address, pointer calculation is not possible.
Unsafe.Pointer pointer: general pointer, any pointer of common type *T can be converted to unsafe.Pointer pointer, and pointer of unsafe.Pointer type can also be converted back to normal pointer, and it can be used without the original pointer type* T is the same. But it cannot perform pointer calculations, and cannot read the value in memory (it must be converted to a specific type of ordinary pointer).
uintptr: To be precise, uintptr is not a pointer, it is an unsigned integer with an ambiguous size. The unsafe.Pointer type can be converted with uinptr. Since the uinptr type saves the value of the address pointed to by the pointer, pointer operations can be performed by this value. During GC, uintptr will not be treated as a pointer, and uintptr type targets will be recycled.

unsafe.Pointer is a bridge, which can convert any type of ordinary pointers to each other, and can also convert any type of pointers to uintptr for pointer arithmetic. However, the conversion of unsafe.Pointer and any type of pointer allows us to write arbitrary values into memory, which will destroy Go's original type system. At the same time, because not all values are legal memory addresses, from uintptr to unsafe. Pointer conversion will also destroy the type system. Therefore, since Go defines the package as unsafe, it should not be used arbitrarily.

Source code analysis

This article is based on Go source code version 1.15.7

Structure

sync.WaitGroup structure is defined below, which comprises a noCopy auxiliary field, and a compound having significance state1 field.

type WaitGroup struct {
    noCopy noCopy

    // 64-bit value: high 32 bits are counter, low 32 bits are waiter count.
    // 64-bit atomic operations require 64-bit alignment, but 32-bit
    // compilers do not ensure it. So we allocate 12 bytes and then use
    // the aligned 8 bytes in them as state, and the other 4 as storage
    // for the sema.
    state1 [3]uint32
}

// state returns pointers to the state and sema fields stored within wg.state1.
func (wg *WaitGroup) state() (statep *uint64, semap *uint32) {
  // 64位编译器地址能被8整除，由此可判断是否为64位对齐
    if uintptr(unsafe.Pointer(&wg.state1))%8 == 0 {
        return (*uint64)(unsafe.Pointer(&wg.state1)), &wg.state1[2]
    } else {
        return (*uint64)(unsafe.Pointer(&wg.state1[1])), &wg.state1[0]
    }
}

Among them, the noCopy field is an empty structure, it does not occupy memory, and the compiler will not fill it with bytes. It is mainly used to perform static compilation checks through the go vet tool to prevent developers from copying it during the use of WaitGroup, which may cause security risks. For this part, you can read "no copy mechanism" for details.

state1 uint32 array with a length of 3. It is used to represent three parts: 1. The counter value of the sub-goroutine set Add() Wait() blocked by 06138d70a9cc91; 3. The semaphore semap.

Since subsequent operations are performed on uint64 type statep , and atomic operations of 64-bit integers require 64-bit alignment, 32-bit compilers cannot guarantee this. Therefore, in a 64-bit and 32-bit environment, the composition meaning of the state1

It should be noted that when we initialize a WaitGroup object, its counter value, waiter value, and semap value are all 0.

Add function

Add() parameter of the 06138d70a9ccf6 function is an integer, which can be positive or negative, which is a change to the value of the counter. If the counter value becomes 0, then all Wait( blocked in the 06138d70a9ccf7) function will be awakened; if the counter value is negative, it will cause panic.

We remove the code of the race detection part, the implementation source code of the Add()

func (wg *WaitGroup) Add(delta int) {
  // 获取包含counter与waiter的复合状态statep，表示信号量值的semap
    statep, semap := wg.state()
    state := atomic.AddUint64(statep, uint64(delta)<<32)
    v := int32(state >> 32)
    w := uint32(state)
  
    if v < 0 {
        panic("sync: negative WaitGroup counter")
    }

    if w != 0 && delta > 0 && v == int32(delta) {
        panic("sync: WaitGroup misuse: Add called concurrently with Wait")
    }
  
    if v > 0 || w == 0 {
        return
    }

    if *statep != state {
        panic("sync: WaitGroup misuse: Add called concurrently with Wait")
    }
  
  // 如果执行到这，一定是 counter=0，waiter>0
  // 能执行到这，一定是执行了Add(-x)的goroutine
  // 它的执行，代表所有子goroutine已经完成了任务
  // 因此，我们需要将复合状态全部归0，并释放掉waiter个数的信号量
    *statep = 0
    for ; w != 0; w-- {
    // 释放信号量，执行一次就将唤醒一个阻塞的waiter
        runtime_Semrelease(semap, false, 0)
    }
}

The code is very concise, we will analyze the key parts next.

     state := atomic.AddUint64(statep, uint64(delta)<<32)  // 新增counter数值delta
    v := int32(state >> 32)   // 获取counter值
    w := uint32(state)        // 获取waiter值

At this time, statep is a uint64 . If the statep contained in 06138d70a9cd6d is 2, the waiter is 1, and the input delta is 1, then the logic process of these three lines of code is shown in the figure below.

After the current counter number v and waiter number w are obtained, their values will be judged. There are several situations.

    // 情况1：这是很低级的错误，counter值不能为负
  if v < 0 {
        panic("sync: negative WaitGroup counter")
    }

  // 情况2：misuse引起panic 
  // 因为wg其实是可以用复用的，但是下一次复用的基础是需要将所有的状态重置为0才可以
    if w != 0 && delta > 0 && v == int32(delta) {
        panic("sync: WaitGroup misuse: Add called concurrently with Wait")
    }
  
  // 情况3：本次Add操作只负责增加counter值，直接返回即可。
  // 如果此时counter值大于0，唤醒的操作留给之后的Add调用者（执行Add(negative int)）
  // 如果waiter值为0，代表此时还没有阻塞的waiter
    if v > 0 || w == 0 {
        return
    }

  // 情况4: misuse引起的panic
    if *statep != state {
        panic("sync: WaitGroup misuse: Add called concurrently with Wait")
    }

Regarding the panic caused by misuse and reused, if there is no sample error code, it is actually more difficult to explain. src/sync/waitgroup_test.go , examples of incorrect usage are given in the Go source code. These examples are located in the 06138d70a9cdc2 file. Readers who want to learn more can look at the examples in the following three test functions.

func TestWaitGroupMisuse(t *testing.T)
func TestWaitGroupMisuse2(t *testing.T)
func TestWaitGroupMisuse3(t *testing.T)

Done function

Done() function is relatively simple, that is, call Add(-1) . In actual use, when the child goroutine task is completed, the Done() function should be called.

func (wg *WaitGroup) Done() {    wg.Add(-1)}

Wait function

If the counter value in WaitGroup is greater than 0, Wait() function will increase the waiter value by 1, and block waiting for the value to be 0 before continuing to execute the subsequent code.

We remove the code of the race detection part, the implementation source code of the Wait()

func (wg *WaitGroup) Wait() {    statep, semap := wg.state()    for {        state := atomic.LoadUint64(statep) // 原子读取复合状态statep        v := int32(state >> 32)            // 获取counter值        w := uint32(state)                 // 获取waiter值    // 如果此时v==0,证明已经没有待执行任务的子goroutine，直接退出即可。        if v == 0 {            return        }        // 如果在执行CAS原子操作和读取复合状态之间，没有其他goroutine更改了复合状态    // 那么就将waiter值+1，否则：进入下一轮循环，重新读取复合状态        if atomic.CompareAndSwapUint64(statep, state, state+1) {      // 对waiter值累加成功后      // 等待Add函数中调用 runtime_Semrelease 唤醒自己            runtime_Semacquire(semap)      // reused 引发panic      // 在当前goroutine被唤醒时，由于唤醒自己的goroutine通过调用Add方法时      // 已经通过 *statep = 0 语句做了重置操作      // 此时的复合状态位不为0，就是因为还未等Waiter执行完Wait，WaitGroup就已经发生了复用            if *statep != 0 {                panic("sync: WaitGroup is reused before previous Wait has returned")            }            return        }    }}

Summarize

To understand the source code implementation of WaitGroup, we need some pre-knowledge, such as semaphores, memory alignment, atomic operations, shift operations, and pointer conversions.

But in fact, the realization of WaitGroup is quite simple. Two counters and a semaphore are maintained through the structure field state1. The counters are the count value counter of the child goroutine added Add() Wait() blocked through 06138d70a9ce97. Used to block and wake up Waiter. When Add(positive n) is executed, counter +=n indicates that n sub-goroutines are added to perform tasks. After each child goroutine completes the task, it needs to call the Done() function to decrement the counter value by 1. When the last child goroutine is completed, the counter value will be 0. At this time, it is necessary to wake up Wait() call.

However, there are several points to note when using WaitGroup

The number of Add() added by the function 06138d70a9cedb must be consistent with the value subtracted Done() If the former is large, then Wait() will never be awakened; if the latter is large, panic will be triggered.
Add() should be executed first.
Do not copy and use the WaitGroup object.
If you want to reuse WaitGroup, you must all previous Wait() again after a new call returns Add() call.

Explain the WaitGroup design in Go

First acquaintance with WaitGroup

Pre-knowledge

signal

Memory alignment

Atomic operation CAS

Shift operations >> and <<

unsafa.Pointer pointer and uintptr

Source code analysis

Structure

Add function

Done function

Wait function

Summarize

机器铃砍菜刀

引用和评论

Golang 中 defer Close() 的潜在风险

前端如何入门 Go 语言

在Java程序中监听mysql的binlog

Jerry和您聊聊Chrome开发者工具

Bitmap 和布隆过滤器傻傻分不清？你这不应该啊

将 Python 和 Rust 融合在一起，为 pyQuil® 4.0 带来和谐

Just for fun——迅速写完快速排序

Explain the WaitGroup design in Go

First acquaintance with WaitGroup

Pre-knowledge

signal

Memory alignment

Atomic operation CAS

Shift operations >> and <<

unsafa.Pointer pointer and uintptr

Source code analysis

Structure

Add function

Done function

Wait function

Summarize

机器铃砍菜刀

引用和评论

Golang 中 defer Close() 的潜在风险

前端如何入门 Go 语言

在Java程序中监听mysql的binlog

Jerry和您聊聊Chrome开发者工具

Bitmap 和 布隆过滤器傻傻分不清？你这不应该啊

将 Python 和 Rust 融合在一起，为 pyQuil® 4.0 带来和谐

Just for fun——迅速写完快速排序

Bitmap 和布隆过滤器傻傻分不清？你这不应该啊