review

In the previous article, we described the basic scheduling process. But we have not resolved what to do if there is a blockage inside the coroutine. For example, in a certain G, the channel's receiving and sending operations will be blocked, then this coroutine cannot always occupy the resources of M. If it is occupied all the time, it may cause all M to be blocked. So we need to temporarily suspend the current G, and reschedule this G to run after the block returns.

Therefore, we need a scheduling mechanism to release the resources occupied by blocked G in time, re-trigger the scheduling logic of the scheduler, suspend the current G, and let other Gs that have not been executed to execute, so as to achieve the maximum resource utilization. change.

Blocking that runtime can intercept

What can be intercepted by runtime? Generally it is our blocking in the code, there are probably these kinds:

  • Channel production/consumption blocking
  • select
  • lock
  • time.sleep
  • Network read and write

In the case that runtime can be intercepted, G will first enter a certain data structure, and after ready, G will be rescheduled to continue execution, and thread M will not continue to be held during the blocking period. Next, we take the first case channel as an example to see how this process is executed.

Take channel blocking as an example

As mentioned in the example of the channel just now, due to blocking, this G needs to passively surrender its P and M. Let's go through this process with the example of channel. Suppose there is such a line of code:

ch <- 1

This is an unbuffered channel, and a value of 1 is written into the channel at this time. Assuming that the consumer has not consumed at this time, the write operation of this channel will be blocked at this time. The data structure of the channel is called hchan:

type hchan struct {
    // 通道里元素的数量
    qcount   uint
    // 循环队列的长度
    dataqsiz uint
    // 指针,指向存储缓冲通道数据的循环队列
    buf      unsafe.Pointer
    // 通道中元素的大小
    elemsize uint16
    // 通道是否关闭的标志
    closed   uint32
    // 通道中元素的类型
    elemtype *_type
    // 已接收元素在循环队列的索引
    sendx    uint  
    // 已发送元素在循环队列的索引
    recvx    uint
    // 等待接收的协程队列
    recvq    waitq
    // 等待发送的协程队列
    sendq    waitq
    // 互斥锁,保护hchan的并发读写,下文会讲
    lock mutex
}

Here we focus on the two fields recvq and sendq. They are a linked list that stores G blocked on the sender and receiver of this channel. The above ch <- 1 In fact, the underlying implementation is a chansend function, which is implemented as follows:

func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
   ...
    // 尝试从recvq,也就是接收方队列中出队一个元素,如果非空,则说明找到了一个正在等待的receiver,可以发送数据过去了。发送完数据直接return
    if sg := c.recvq.dequeue(); sg != nil {
        send(c, sg, ep, func() { unlock(&c.lock) }, 3)
        return true
    }

    // 代码走到这里,说明没有接收方,需要阻塞住等待接收方(如果是无缓冲channel的话)
    if !block {
        unlock(&c.lock)
        return false
    }
    
    // 把当前channel和G,打包生成一个sudog结构,后面会讲为什么这样做
    gp := getg()
    mysg := acquireSudog()
    mysg.g = gp
    mysg.c = c    
    ... 
    // 将sudog放到sendq中
    c.sendq.enqueue(mysg)
    
    // 调用gopark,这里内部实现会讲M与G解绑,并触发一次调度循环
    gopark(chanparkcommit, unsafe.Pointer(&c.lock), waitReasonChanSend, traceEvGoBlockSend, 2)
    
    return true
}

Let's summarize the process of this chansend (take the unbuffered channel as an example):

  • Try to get consumers from recvq
  • If recvq is not empty, send data; if it is empty, you need to block
  • Get a sudog structure and assign the current G to the g field
  • Hang sudog on sendq and wait for it to wake up
  • Call gopark to unbind M and G, trigger a schedule again, and M to execute other G

Why is sudog instead of G

So why use sudog instead of the original G structure here. The answer is that one G can be on multiple waiting lists. Both recvq and sendq are a waitq structure. It is a doubly linked list. If the first G has been linked to the linked list, then he must store the address of the next G to successfully complete the logic of the doubly linked list, such as:

type g struct {
    next *g
    prev *g
}

And g may be hung on multiple waiting lists (such as a select operation, a G may be blocked on multiple channels), so next and prev in g must have multiple values. That is, the values of the addresses of next and prev on multiple waiting lists may be different. The relationship between G and the waiting list is many-to-many, so this prev and next must not be directly maintained on G, so we will package G and channel together into a sudog structure. It is similar to the many-to-many intermediate table design in our MySQL, which is equivalent to maintaining a g_id and channel_id:

type sudog struct {

    // 原始G结构。相当于g_id
    g *g

    // 等待链表上的指针
    next *sudog
    prev *sudog

    // 所属的channel,相当于channel_id
    c    *hchan
}

The final effect is as follows:

gopark

We know that after the sudog is packaged and placed in sendq, go_park will be called to execute the blocking logic. Go_park will call the park_m method again, switch to the g0 stack, unbind M from the current G, trigger a schedule again, and let M bind other Gs to execute:

// park continuation on g0.
func park_m(gp *g) {
    _g_ := getg()

    // 将G的状态设置为waiting
    casgstatus(gp, _Grunning, _Gwaiting)
    
    // 解除M与G的绑定
    dropg()

    // 重新执行一次调度循环
    schedule()
}

When to wake up

So the question is, the current G has been blocked on sendq, then who will wake up this G to let him continue to execute? Obviously it is the receiving end of the channel. In the source code, the operation opposite to chansend is chanrecv:

func chanrecv(c *hchan, ep unsafe.Pointer, block bool) (selected, received bool) {

    ...
    // 尝试从sendq中拿一个等待协程出来
    if sg := c.sendq.dequeue(); sg != nil {
        // 如果拿到了,那么接收数据,刚才我们的channel就属于这种情况
        recv(c, sg, ep, func() { unlock(&c.lock) }, 3)
        return true, true
    }

    if !block {
        unlock(&c.lock)
        return false, false
    }

    // 同上,打包成一个sudog结构,挂到recvq等待链表上
    gp := getg()
    mysg := acquireSudog()
    mysg.g = gp
    mysg.c = c
    c.recvq.enqueue(mysg)

    // 同理,拿不到sendq,调用gopark阻塞
    gopark(chanparkcommit, unsafe.Pointer(&c.lock), waitReasonChanReceive, traceEvGoBlockRecv, 2)
    ...
    
    return true, success
}

goready

We see that the logic of chanrecv and chansend are roughly the same, so I won't go into details here. Since there is data on our sendq just now, we must enter the recv() method to receive data here. The goready() method will be called here:

// Mark gp ready to run.
func ready(gp *g, traceskip int, next bool) {

    ...
    status := readgstatus(gp)

    // 标记G为grunnable状态
    _g_ := getg()
    casgstatus(gp, _Gwaiting, _Grunnable)
    
    // 放入runq中等待调度循环消费
    runqput(_g_.m.p.ptr(), gp, next)
    
    // 唤醒一个闲置的P来执行G
    wakep()
    releasem(mp)
}

goready and gopark are a pair of operations, gopark is blocking, and goready is wake-up. It will take out the G bound in sudog, pass it into the ready() method, and set it from gwaiting to grunnable state. And execute runqput again. Put G on P's local queue/global queue and wait for the scheduling cycle to consume. In this way, the entire process can be run.

in conclusion,

  • The sender calls gopark to suspend, which must be awakened by the receiver (or close) through goready
  • The receiver calls gopark to suspend, it must be awakened by the sender (or close) through goready

Blocking that runtime cannot intercept

What can't be intercepted by runtime? That is, CGO code and system calls. CGO will not be mentioned here, because system calls may also be blocked here, and are not blocked at the runtime level, and runtime cannot allow G to enter a certain related data structure, which cannot be captured by runtime.

At this time, a special thread sysmon for background monitoring is needed to monitor this situation. It will be executed periodically and continuously. It will apply for a single M, and it can be executed without binding P, with the highest priority.

The core of sysmon is the sysmon() method. The monitoring will call the retake() method in the loop to preempt P that is in a long-term block, and the function will traverse all P at runtime. The implementation of retake() is as follows:

func retake(now int64) uint32 {
    n := 0
    for i := 0; i < len(allp); i++ {
        _p_ := allp[i]
        pd := &_p_.sysmontick
        s := _p_.status
        //当处理器处于_Prunning或者_Psyscall状态时,如果上一次触发调度的时间已经过去了10ms,我们会调用preemptone()抢占当前P
        if s == _Prunning || s == _Psyscall {
            t := int64(_p_.schedtick)
            if pd.schedwhen+forcePreemptNS <= now {
                preemptone(_p_)
            }
        }
        // 当处理器处系统调用阻塞状态时,当处理器的运行队列不为空或者不存在空闲P时,或者当系统调用时间超过了10ms,会调用handoffp将P从M上剥离
        if s == _Psyscall {
            if runqempty(_p_) && atomic.Load(&sched.nmspinning)+atomic.Load(&sched.npidle) > 0 && pd.syscallwhen+10*1000*1000 > now {
                continue
            }
            if atomic.Cas(&_p_.status, s, _Pidle) {
                n++
                _p_.syscalltick++
                handoffp(_p_)
            }
        }
    }
    return uint32(n)
}

By preempting P in the background monitoring loop, sysmon avoids the problem of long-term blockage and starvation caused by the same G occupying M for too long.

Follow us

Readers who are interested in this series of articles are welcome to subscribe to our official account, and pay attention to the blogger not to get lost next time~
image.png


NoSay
449 声望544 粉丝