PART - 0 A small innovation
There are many small innovative basic modules in the open source project Workflow. Today, we will introduce the most commonly used traditional data structure: multi-producer-multi-consumer-message queue .
It is important to state that the message queue mentioned here is not a message queue service such as kafka, but a traditional data structure in a single machine, which is mainly used for coordinated scheduling of machine resources, such as thread/coroutine scheduling, network asynchronous resource sending and receiving, etc., that is, you can Coordinate execution resources, and can also be used as a temporary buffer for data resources.
Some time ago, I introduced a thread pool of 300 lines of code and an Executor of 200 lines used for calculation scheduling on top of the thread pool. The msgqueue introduced today is simpler and more commonly used, with less than 200 lines of code . The ~msgqueue module is also very independent and can be used (played) directly.
👉One sentence summary: Two queues are used internally to separate the contention between producers and consumers respectively. While improving throughput, it still maintains a relatively excellent long tail, and takes into account order preservation and minimal code. Good tradition; and the extremely economical linked list implementation also has something to learn in reducing memory allocation.
Code location: https://github.com/sogou/workflow/blob/master/src/kernel/msgqueue.c
PART - 1 common implementation of message queue
Friends who have known earlier may know that in fact, before the project was open sourced, I wrote a series of articles and comparisons for message queues. The following are some common implementations that I have seen, and you are welcome to add them in the comments. Code that I think is great:
- Simple and rough version: The basic implementation of one lock and two condition variables is too simple, so you can write it by hand~
- Double queue version: msgqueue of Workflow
https://github.com/sogou/workflow - Linked list version: LinkedBlockingQueue
https://developer.android.com/reference/java/util/concurrent/BlockingQueue
(This BlockingQueue series has quite a lot of implementations) - grpc version: mpmc queue
https://github.com/grpc/grpc/blob/master/src/core/lib/iomgr/executor/mpmcqueue.cc - Lock-free version: single-producer single-consumer kfifo for the kernel
https://github.com/torvalds/linux/blob/master/lib/kfifo.c - Unordered version: go scheduling workshoping
https://github.com/golang/go/blob/master/src/runtime/proc.go
The above codes are not only worth reading, but also worth learning to implement when you have nothing to do~
PART - Algorithm for 2 msgqueue
Workflow's msgqueue is very simple, two pictures are enough to explain the internal structure and process:
Several features:
- There are two lists inside, the producer puts the message on the production queue , and the consumer takes the message from the consumption queue ;
- Two locks are used to manage two queues respectively;
- Two condition variables are used to manage the wait and wake up of producers and consumers respectively;
- The queue can have two states of block / nonblock ;
- If it is block, the maximum length of the producer queue is maxlen , if it is nonblock, the maximum length is not limited;
The algorithm is very simple steps:
- When get_list (that is, the consumer queue) is not empty, the consumer can get a message;
- Otherwise, the consumer will wait until the put_list (that is, the producer queue) is not empty, and then exchange the two queues;
This is very good performance when the queue is very busy and there are many consumers ~
A long time ago, I had very simple pressure test data in my personal queue project: GitHub - holmes1412/queue: some different implements of queue and test , not perfect, just for reference, after all, my IQ is here._.
It is also recommended to use the following small project to see how msgqueue refactors the thread pool to achieve a leap in performance: https://github.com/Barenboim/msgqueue_thrdpool
PART - 3 code details
Or according to the universal seven steps introduced before, we can follow the more than 100 lines of code to learn the queue again~
Step 1: Look at the interface through the header file
Open msgqueue.h :
msgqueue_t *msgqueue_create(size_t maxlen, int linkoff);
void msgqueue_put(void *msg, msgqueue_t *queue);
void *msgqueue_get(msgqueue_t *queue);
void msgqueue_set_nonblock(msgqueue_t *queue);
void msgqueue_set_block(msgqueue_t *queue);
void msgqueue_destroy(msgqueue_t *queue);
-
msgqueue_create()
function to create a message queue. - The parameter maxlen represents the maximum length of the production queue. In the default mode, the producer will block when the maximum length is reached.
- And the second parameter linkoff is a highlight of this module. It lets the user specify a message offset, and the user needs to reserve a pointer-sized space at this position for each message for internal zipping. This simple design avoids one more memory allocation and release when entering and leaving the message queue.
If you think this explanation is not clear, you can also look at the author's comments in the module:
Knowing the msgqueue_create()
interface, msgqueue_get(
) and msgqueue_put()
do not need to introduce too much. Pay attention to the linkoff position of msg, you need to reserve a pointer.
Step 2: Data structures on the .h interface
The msgqueue_t on the above interface is the real body of the message queue, which seems to be implemented in msgqueue.c.
typedef struct __msgqueue msgqueue_t;
Step 3: Internal data structure of the .c file
Next is the exciting moment. For the sake of convenience, I directly use void **
to make a linked list. One of the advantages of doing this is:
Make full use of the msg memory allocated by the user, and the overhead of allocating and releasing space can be saved inside the message queue (I am really a little clever ghost (๑´ ▽ `๑)ノ
Of course, how to achieve it is not important, no need to struggle~
struct __msgqueue
{
size_t msg_max;
size_t msg_cnt;
int linkoff;
int nonblock;
void *head1;
void *head2;
void **get_head;
void **put_head;
void **put_tail;
pthread_mutex_t get_mutex;
pthread_mutex_t put_mutex;
pthread_cond_t get_cond;
pthread_cond_t put_cond;
};
Here is what was said earlier:
- Two internal queues: get_head , put_head ;
- Two locks: get_mutex , put_mutex ;
- Two condition variables: get_cond , put_cond ;
- The other msg_max and msg_cnt are well understood, which are the maximum allowed length of the internal producer queue and the current actual length of the producer queue, respectively.
- nonblock obviously indicates whether the queue is in blocking mode . For simplicity, we only discuss blocking mode in the following code.
- We see that there is put_tail , but there is no get_tail, because when consumers get get, they can just take it from the head. Only the producer put needs to pass the head and tail to ensure the global order of the message .
- linkoff has already been introduced, which is the key point for calculating the offset of the internal linked list .
Step 4: Look at the implementation of the interface
Let's take a look at msgqueue_create()
, basically enough to see the internal data management method:
msgqueue_t *msgqueue_create(size_t maxlen, int linkoff)
{
// 各种初始化,最后设置queue的成员变量如下:
queue->msg_max = maxlen;
queue->linkoff = linkoff;
queue->head1 = NULL;
queue->head2 = NULL;
// 借助两个head分别作为两个内部队列的位置
queue->get_head = &queue->head1;
queue->put_head = &queue->head2;
// 一开始队列为空,所以生产者队尾也等于队头
queue->put_tail = &queue->head2;
queue->msg_cnt = 0;
queue->nonblock = 0;
...
}
msgqueue_create(
) interface will pass linkoff , and then this message queue is used as the actual length of each message, so as to calculate the offset of the next position.
Then look at the producer interface msgqueue_put()
:
void msgqueue_put(void *msg, msgqueue_t *queue)
{
// 1. 通过create的时候传进来的linkoffset,算出消息尾部的偏移量
void **link = (void **)((char *)msg + queue->linkoff);
// 2. 设置为空,用于表示生产者队列末尾的后面没有其他数据
*link = NULL;
// 3. 加生产者锁
pthread_mutex_lock(&queue->put_mutex);
// 4. 如果当前已经有msg_max个消息的话
// 就要等待消费者通过put_cond来叫醒我
while (queue->msg_cnt > queue->msg_max - 1 && !queue->nonblock)
pthread_cond_wait(&queue->put_cond, &queue->put_mutex);
// 5. put_tail指向这条消息尾部,维护生产者队列的消息个数
*queue->put_tail = link;
queue->put_tail = link;
queue->msg_cnt++;
pthread_mutex_unlock(&queue->put_mutex);
// 6. 如果有消费者在等,通过get_cond叫醒他~
pthread_cond_signal(&queue->get_cond);
}
Correspondingly, the consumer interface msgqueue_get()
:
void *msgqueue_get(msgqueue_t *queue)
{
void *msg;
// 1. 加消费者锁
pthread_mutex_lock(&queue->get_mutex);
// 2. 如果目前get_head不为空,表示有数据;
// 如果空,那么通过__msgqueue_swap()切换队列,也可以拿到数据
if (*queue->get_head || __msgqueue_swap(queue) > 0)
{
// 3. 对应put中的计算方式,根据尾巴的偏移量把消息起始偏移量算出来
msg = (char *)*queue->get_head - queue->linkoff;
// 4. 往后挪,这时候的*get_head就是下一条数据的偏移量尾部了
*queue->get_head = *(void **)*queue->get_head;
}
else
{
// 5. 没有数据,同时设置errno~~~
msg = NULL;
errno = ENOENT;
}
pthread_mutex_unlock(&queue->get_mutex);
return msg;
}
Step 5: Implementation of other core functions
Undoubtedly, there is also a core function __msgqueue_swap( )
, which is the key to the algorithm for switching queues:
static size_t __msgqueue_swap(msgqueue_t *queue)
{
// 1. 用临时变量记录下当前的get队列偏移量
void **get_head = queue->get_head;
size_t cnt;
// 2. 把刚才的生产者队列换给消费者队列
queue->get_head = queue->put_head;
// 3. 只有这个地方才会同时持有消费者锁和生产者锁
pthread_mutex_lock(&queue->put_mutex);
// 4. 如果当前对列本身就是空的
// 这里就会帮等待下一个来临的生产者通get_cond叫醒我
while (queue->msg_cnt == 0 && !queue->nonblock)
pthread_cond_wait(&queue->get_cond, &queue->put_mutex);
cnt = queue->msg_cnt;
// 5. 如果当前对列是满的,说明可能有生产者在等待
// 通过put_cond叫醒生产者(可能有多个,所以用broadcast)
if (cnt > queue->msg_max - 1)
pthread_cond_broadcast(&queue->put_cond);
// 6. 把第一行的临时变量换给生产者队列,清空生产者队列
queue->put_head = get_head;
queue->put_tail = get_head;
queue->msg_cnt = 0;
pthread_mutex_unlock(&queue->put_mutex);
// 7. 返回刚才多少个,这个会影响get里的逻辑
return cnt;
}
Step 6: Associating the functions
The interface of this module is relatively simple, and has been associated with each other in put and get according to the process of producer and consumer respectively. Here, in order to deepen our understanding, let's take a different angle and organize from the dimensions of two locks and two condition variables :
- put_mutex: The producers grab the put lock. Only when the consumption queue is empty and there are consumers who want to swap, will the put lock be grabbed at the same time;
- get_mutex: grab the get lock between consumers, if the consumption queue is empty, the consumer that grabs the get lock will enter swap to change the queue;
- put_cond: The producer who cannot wait for the storage space will wait on the put condition variable , and the consumer who has finished changing the queue will wake up 0 or more producers;
- get_cond: If it is found that the consumption queue is empty, a consumer will enter the swap as described above. At this time, it will first check whether there is anything in the production queue. If there is no food left in the production queue, the consumer will wait for the get condition variable On, the next producer wakes up the consumer;
Step 7: Other Processes
For such a simple msgqueue , the other process is to set nonblock . It is worth paying attention to some behaviors after setting nonblock:
void msgqueue_set_nonblock(msgqueue_t *queue)
{
queue->nonblock = 1;
pthread_mutex_lock(&queue->put_mutex);
// 叫醒一个消费者
pthread_cond_signal(&queue->get_cond);
// 叫醒所有生产者
pthread_cond_broadcast(&queue->put_cond);
pthread_mutex_unlock(&queue->put_mutex);
}
It will be found here that the logic of using signal and broadcast is very unified with the original use of put_cont and get_cond.
We have just seen two places where nonblock is judged, and both are understood in terms of block. So what is the process if nonblock is set? I won't go into details about this, and leave it to everyone as an after-school exercise →_→ (not that the author is lazy
PART - 4 Summary
In fact, I have written so many articles. Maybe you can find that there are many new ideas, new algorithms, and new data structures in Workflow. Although they are not all big innovations like Executor, many subtle places can have small innovations in practice. , so as to achieve a substantial increase in performance .
This is also my biggest emotion when I participated in this project: the traditional and original code is still worth pondering, optimizing, and striving for excellence.
Another bit of engineering experience is the versatility reflected in so many articles: there are many trade offs in engineering practice, such as throughput and delay in message queues. Request latency adds even a little extra overhead.
Objectively speaking, the implementation of msgqueue itself is very restrained and extremely simplified , but in fact, there are many implementations of message queues. Why doesn't workflow use more complex and efficient data structures? The reasons are all in generality:
- Throughput and delay : using more complex algorithms often has overall benefits, but whether the overall benefits can be greater than the overhead introduced by complex codes in all scenarios, or whether a general framework is more suitable for using a general model;
- Analyze the bottleneck of the scenario : For the actual network sending and receiving of Workflow, only when the message is very small, the QPS is very high, and no serialization/deserialization/business computing logic is required, the bottleneck will fall on msgqueue;
But we are all open to new practices ^^ (well, I was too busy before to do in-depth research) and I will replace it in the workflow when I have time. I hope that it can be triggered again. some new ideas~~~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。