Talking about the scheduling mechanism of Goroutine

1. What is Goroutine

In go language, each concurrent execution unit is called a goroutine.

When a program starts, its main function runs in a separate goroutine, we call it main goroutine, and new goroutines are created using go statements. Syntactically, a go statement is an ordinary function or method call preceded by the keyword go, which causes the function in the statement to run in a newly created goroutine.

 f()    // call f(); wait for it to return
go f() // create a new goroutine that calls f(); don't wait

Goroutine can be regarded as a layer of abstraction added to thread, which is user-mode and lighter. Because of this layer of abstraction, Gopher does not face threads directly. The operating system, on the other hand, cannot see the existence of goroutines. It is enough to execute threads with peace of mind. Threads are the basic unit of its scheduling.

What is the difference between goroutine and thread?

1. Memory usage <br>The stack memory consumption for creating a goroutine is 2 KB. During the actual running process, if the stack space is not enough, it will automatically expand. Creating a thread consumes 1 MB of stack memory.

For an HTTP Server built in Go, it is very easy to create a goroutine to handle each incoming request. However, if a service is built in a language that uses threads as a concurrency primitive, such as Java, each request corresponds to one thread, which is too wasteful of resources, and an OOM error (Out Of Mermory Error) will soon occur.

2. Create and destroy
Thread creation and destruction will have huge consumption, because dealing with the operating system is at the kernel level, and the usual solution is the thread pool. Since goroutines are managed by the Go runtime, the consumption of creation and destruction is very small and is user-level.

3. Switching <br>When threads switch, various registers need to be saved for future recovery: including all registers, 16 general registers, program counter, stack pointer, segment register, 16 XMM registers, FP coprocessor, 16 AVX registers, all MSRs, etc.

Goroutine save and restore only requires three registers: the program counter, stack pointer and DX register, and does not need to be trapped in the operating system kernel layer.

Generally speaking, thread switching will consume 1000-1500 nanoseconds, and the switching of goroutines is about 200 ns. In comparison, the switching cost of goroutines is much smaller than that of threads.

Second, the role of the Go scheduler

When it comes to "scheduling", the first thing that comes to our mind is the scheduling of processes and threads by the operating system. The operating system scheduler will schedule multiple threads in the system to run on the physical CPU according to a certain algorithm.

This traditional way of supporting concurrency has many shortcomings:
The cost of a thread is much lower than that of a process, but we still cannot create a large number of threads, because in addition to the large resources occupied by each thread, the cost of the operating system scheduling and switching threads is not small;

A Go program is just a user-level program to the operating system. It only sees threads in its eyes, and it doesn't even know the existence of Goroutines. The program that puts these goroutines on the "CPU" according to a certain algorithm is called a goroutine scheduler or goroutine scheduler. At the operating system level, the "CPU" resource that Thread competes for is the real physical CPU, but at the Go program level, the "CPU" resource that each Goroutine competes for is the operating system thread.

Speaking of which, the task of the goroutine scheduler is clear:
The goroutine scheduler reduces the memory overhead of frequent thread switching by using the same number of threads as the CPU, and at the same time executes goroutines with lower overhead on each thread to reduce the load on the operating system and hardware.

Workflow

 +-------------------- sysmon ---------------//------+ 
                            |                                                   |
                            |                                                   |
               +---+      +---+-------+                   +--------+          +---+---+
go func() ---> | G | ---> | P | local | <=== balance ===> | global | <--//--- | P | M |
               +---+      +---+-------+                   +--------+          +---+---+
                            |                                 |                 | 
                            |      +---+                      |                 |
                            +----> | M | <--- findrunnable ---+--- steal <--//--+
                                   +---+ 
                                     |
                                   mstart
                                     |
              +--- execute <----- schedule 
              |                      |   
              |                      |
              +--> G.fn --> goexit --+ 


              1. go func() 语句创建G。
              2. 将G放入P的本地队列（或者平衡到全局队列）。
              3. 唤醒或新建M来执行任务。
              4. 进入调度循环
              5. 尽力获取可执行的G，并执行
              6. 清理现场并且重新进入调度循环

3. Go scheduler model and evolution process

1. GM model

On March 28, 2012, Go 1.0 was officially released. In this version, each goroutine corresponds to an abstract structure in runtime: G, and os thread is abstracted to a structure: M.

If M wants to execute and put back G, it must access the global G queue, and there are multiple Ms, that is, multi-thread access to the same resource needs to be locked to ensure mutual exclusion/synchronization, so the global G queue is protected by a mutual exclusion lock.

Although this structure is simple, there are many problems:
1. All goroutine-related operations, such as creation, rescheduling, etc., must be locked;
2. M transfer G will cause delay and additional system load. For example, when G includes the creation of a new coroutine, M creates G1. In order to continue to execute G, G1 needs to be handed over to M1 for execution, which also causes poor locality. Because G1 and G are related, it is best to put Execute on M, not other M1.
3. System calls (CPU switching between M) cause frequent thread blocking and unblocking operations to increase system overhead.

2. GPM model

The GPM scheduling model and work stealing algorithm were implemented in Go 1.1, and this model is still in use today:

In the new scheduler, in addition to M (thread) and G (goroutine), P (Processor) is introduced. P is a "logical proccessor". In order for each G to actually run, it first needs to be assigned a P (entering the local queue of P). For G, P is the "CPU" that runs it. It can be said: There is only P in G's eyes. But from the Go scheduler's point of view, the real "CPU" is M, and only by binding P and M can G in P's runq run.

3. Preemptive scheduling

The implementation of the GPM model is a major improvement of the Go scheduler, but the Scheduler still has a headache, that is, it does not support preemptive scheduling, so that once an infinite loop or perpetual loop code logic appears in a G, then G will be permanently occupied P and M assigned to it, other G in the same P will not be scheduled, and there will be a "starvation" situation. More seriously, when there is only one P (GOMAXPROCS=1), other Gs in the whole Go program will "starve to death".

The principle of this preemptive scheduling is to add an extra piece of code at the entry of each function or method, so that the runtime has the opportunity to check whether preemptive scheduling needs to be performed.

Fourth, in-depth understanding of the GPM model

1. Basic Concepts

 1、全局队列：存放等待运行的G;

2、P的本地队列：同全局队列类似，存放的也是等待运行的G，存储数量不超过256个;

3、G：Groutine协程，调度系统的最基本单位，拥有运行函数的指针、栈、上下文;

4、P：Processor，调度逻辑处理器，拥有各种G对象队列、链表、一些cache和状态,P的数量决定了系统内最大可并行的G的数量;

所有的P都在程序启动时创建，并保存在数组中，默认等于CUP核数，最多有GOMAXPROCS(可配置)个；

5、M：代表实际工作的执行者，对应到操作系统级别的线程，每个M就像一个勤劳的工作者，从各种队列中找到可运行的G；

Here is a picture to illustrate their relationship:

The gopher uses a cart to carry a pile of bricks to be processed. M can be regarded as the gopher in the picture, P is the cart, and G is the bricks in the cart.

The creation and scheduling cycle of goroutine is a production-consumption process, and the operation of the entire go program is to continuously execute the production and consumption process of goroutine.

2. The design strategy of the scheduler

2.1. Multiplexing threads

Avoid frequent creation and destruction of threads, but reuse threads.
1) Work stealing mechanism When the thread has no running G, it tries to steal G from the P bound by other threads, instead of destroying the thread.

2) Hand off mechanism When the thread is blocked due to G system call, the thread releases the bound P and transfers P to other idle threads for execution.

2.2, using parallelism

GOMAXPROCS sets the number of P, at most GOMAXPROCS threads are distributed on multiple CPUs to run at the same time.

2.3. Preemption

When the Go program starts, the runtime will start an M named sysmon, which will periodically issue a preemptive schedule to the G task running for a long time (>=10MS) to prevent other goroutines from being starved to death.

3. Goroutine scheduling process

1. We create a goroutine through go func();

2. There are two queues for storing G, one is the local queue of the local scheduler P, and the other is the global G queue. The newly created G will be stored in the local queue of P first, and if the local queue of P is full, it will be stored in the global queue;

3. G can only run in M, an M must hold a P, and the relationship between M and P is 1:1;

4. The process of an M scheduling G execution is a circular mechanism. M will pop a G in an executable state from the local queue of P to execute it. If the local queue of P is empty, it will look at the global queue, and the global queue will not. It will steal an executable G from other MP combinations to execute, and if other queues can't get M, it will spin;

5. If G is blocked on a channel operation or network I/O operation, G will be placed in a wait queue, and M will try to run P's next runnable G. If P has no runnable G for M to run at this time, then M will unbind P and enter the suspended state.

When the I/O operation is completed or the channel operation is completed, the G in the waiting queue will be woken up, marked as runnable, and put into the queue of a certain P, bound to an M and continued to execute.

6. If G is blocked on a system call, not only G will block, but M that executes G will also unbind P and enter the suspended state together with G. If there is an idle M at this time, then P will be bound to it and continue to execute other Gs; if there is no idle M, but there are still other Gs to execute, then the Go runtime will create a new M (thread ).

When the M system call ends, the G will try to obtain an idle P for execution and put it into the P's local queue. If P cannot be obtained, then the thread M becomes dormant, joins the idle thread, and then the G is put into the global queue.

7. When a new goroutine is created or multiple goroutines enter the waiting state, it will first determine whether there is a spinning M, and if there is a spinning M, when there is no spinning M, but there is a M free queue. An idle M will wake up the M, otherwise a new M will be created.

4. The life cycle of the scheduler

Special M0 and G0
M0
M0 is the main thread numbered 0 after starting the program. M0 is responsible for performing initialization operations and starting the first G, and is also responsible for doing specific things throughout the runtime - system monitoring (sysmon).

G0
G0 is the first gourtine that will be created every time an M is started. G0 is only used for scheduling G, G0 does not point to any executable function, and each M will have its own G0. The stack space of G0 is used when scheduling or system calls, and the G0 of global variables is the G0 of M0.

The running process of the scheduler is shown in the figure:

The runtime creates the initial thread m0 and goroutine g0, and associates the two.
Scheduler initialization: Initialize m0, stack, garbage collection, and create and initialize a P-list consisting of GOMAXPROCS Ps.
There is also a main function in runtime - runtime.main. After the code is compiled, runtime.main will call main.main. When the program starts, it will create a goroutine for runtime.main, call it main goroutine, and then add the main goroutine. to P's local queue.
Start m0, m0 has bound P, will get G from P's local queue, and get the main goroutine.
G has a stack, and M sets the running environment according to the stack information and scheduling information in G
M run G
G exits, returns to M again to obtain a runnable G, and repeats this until main.main exits, and runtime.main executes Defer and Panic processing, or calls runtime.exit to exit the program.

The life cycle of the scheduler occupies almost the whole life of a Go program. Before the goroutine of runtime.main executes, it prepares for the scheduler. The running of the goroutine of runtime.main is the real start of the scheduler, until runtime.main end and end.

V. Summary

The essence of the Go scheduler is to allocate a large number of goroutine tasks to a small number of threads to run, and use multi-core parallelism to achieve more powerful concurrency.

References:
Also talk about goroutine scheduler
Golang's coroutine scheduler principle and GMP design idea
goroutine scheduler
In-depth decryption of the scheduler of the Go language

Talking about the scheduling mechanism of Goroutine

1. What is Goroutine

Second, the role of the Go scheduler

Workflow

3. Go scheduler model and evolution process

1. GM model

2. GPM model

3. Preemptive scheduling

Fourth, in-depth understanding of the GPM model

1. Basic Concepts

2. The design strategy of the scheduler

2.1. Multiplexing threads

2.2, using parallelism

2.3. Preemption

3. Goroutine scheduling process

4. The life cycle of the scheduler

V. Summary

爆裂Gopher

引用和评论

从0到1安装启动Kratos框架

腾讯 tRPC-Go 教学——（5）filter、context 和日志组件

大模型时代，后端程序员如何避免被AI卷死？

Dolphinscheduler IDEA本地调试

如何将豆瓣观影记录实时同步至博客中

马上卸载这个恶心的软件！

腾讯 tRPC-Go 教学——（1）搭建服务