Common Implementation Schemes of Delayed Messages

Originality is not easy, please indicate the source when reprinting

foreword

Delayed message (timed message) means that in the distributed asynchronous message scenario , the producer sends a message, hoping to be consumed by the consumer at a specified delay or at a specified time, instead of being consumed immediately.

Delayed messages are applicable to a wide range of business scenarios. In a distributed system environment, the function of delayed messages will generally sink to the middleware layer, usually built into MQ or cohesively into a common basic service.

This article aims to discuss the implementation schemes of common delayed messages and the advantages and disadvantages of scheme design.

Implementation plan

1. A solution based on external storage

The external storage discussed here refers to other storage systems introduced in addition to the storage that comes with MQ itself.

The solution based on external storage is essentially a routine, which distinguishes MQ from the delay module, and the delay message module is an independent service/process. Delayed messages are first retained in other storage media, and then delivered to MQ when the message expires. Of course, there are also some detailed designs, such as the logic of direct delivery when the message enters the delayed message module that has expired, which will not be discussed here.

The difference between the following solutions is that different storage systems are used.

Database based (eg MySQL)

It is implemented based on the way of delaying message tables in relational databases (such as MySQL).

CREATE TABLE `delay_msg` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `delivery_time` DATETIME NOT NULL COMMENT '投递时间',
  `payloads` blob COMMENT '消息内容',
  PRIMARY KEY (`id`),
  KEY `time_index` (`delivery_time`)
)

Expired messages are scanned regularly by the timing thread and then delivered. The scan interval of the timing thread is theoretically the minimum time precision with which you delay messages.

advantage:

Simple to implement;

shortcoming:

B+Tree index is not suitable for a large number of writes in message scenarios;

Based on RocksDB

RocksDB's solution is actually to choose a more suitable storage medium based on the above solution.

RocksDB talked about in the author's previous article, LSM tree is more suitable for a large number of writes. Chronos, the delayed message module in Didi's open source DDMQ, adopts this solution.

In a nutshell, the DDMQ project is to add a unified proxy layer to RocketMQ, and in this proxy layer, some functional dimension extensions can be made. The logic of the delayed message is that the broker layer realizes the forwarding of the delayed message. If it is a delayed message, it will be delivered to the dedicated topic of Chronos in RocketMQ first. The delayed message module Chronos consumes the delayed messages and dumps them to RocksDB, followed by similar logic, which regularly scans expired messages and then delivers them to RocketMQ.

This program is honestly a relatively heavy program. Because if it is implemented based on RocksDB, from the perspective of data availability, you also need to handle the logic of data synchronization of multiple copies by yourself.

advantage:

RocksDB LSM tree is very suitable for a large number of writes in message scenarios;

shortcoming:

The implementation scheme is heavy. If you adopt this scheme, you need to implement the data disaster recovery logic of RocksDB yourself;

Based on Redis

Let's talk about the Redis solution. Below is a more complete solution.

This program comes from: https://www.cnblogs.com/lylife/p/7881950.html

All the delayed messages in the Messages Pool are stored in a KV structure, the key is the message ID, and the value is a specific message (the Redis Hash structure is chosen here mainly because the hash structure can store a large amount of data, and it will be executed when the data is large. Progressive rehash expansion, and time complexity is O(1) for both HSET and HGET commands)
Delayed Queue is 16 ordered queues (the queue supports horizontal expansion), the structure is ZSET, the value is the message ID in the messages pool, and the score is the expiration time (divided into multiple queues to improve the speed of scanning)
Worker represents the processing thread and scans expired messages in the Delayed Queue through scheduled tasks

In my opinion, there are several considerations in choosing Redis storage for this scheme.

Redis ZSET is great for implementing delayed queues
Performance issues, although ZSET insertion is an O(logn) operation, Redis is based on memory operations and has done a lot of performance optimizations internally.

However, this scheme actually has some considerations. The above scheme meets the requirements for concurrent performance by creating multiple Delayed Queues, but this also brings about how multiple Delayed Queues are evenly distributed in the case of multiple nodes, and it is likely to occur. In the case of concurrent and repeated processing of expired messages, should a concurrency control design such as distributed locks be introduced?

In a small-scale scenario, the architecture of the above solution can actually be degenerated into a master-slave architecture, allowing only the master node to process tasks, and the slave node only for disaster recovery backup. It is easier and more controllable to achieve.

Defects and Improvements of Timed Thread Checking

In the above-mentioned schemes, the expired messages are obtained through the scheme of regular thread scanning.

The timing thread scheme will waste resources when the amount of messages is small, and when the amount of messages is very large, the problem of inaccurate delay time will occur due to the unreasonable setting of the scan interval. You can save CPU resources through wait-notify with the help of the ideas in the JDK Timer class.

Get the latest delayed message in the middle, and then wait (execution time - current time), so that there is no need to waste resources and it will automatically respond when it arrives. If a new message comes in and is smaller than the message we are waiting for, then directly notify wakes up, re-fetches this smaller message, then waits, and so on.

2. Implementation in open source MQ

Let's talk about the current open source MQ with its own delayed message function, how they are implemented

RocketMQ

The RocketMQ open source version supports delayed messages, but only supports 18 levels of delay, and does not support any time. It's just that this level can be customized in RocketMQ. Fortunately, it is enough for ordinary business. The default value is "1s 5s 10s 30s 1m 2m 3m 4m 5m 6m 7m 8m 9m 10m 20m 30m 1h 2h", 18 levels.

SCHEDULE_TOPIC_XXXX 's terms, messages with a set delay Level will be temporarily stored in a topic named 061e0ee4c327ee, and stored in a specific queue according to the level, queueId = delayTimeLevel – 1, is a queue that only stores messages with the same delay, Guarantees that messages with the same sending delay can be consumed sequentially. broker will consume SCHEDULE_TOPIC_XXXX on schedule and write the message to the real topic.

The following is a schematic diagram of the entire implementation scheme, red represents the delivery delay message, and purple represents the delayed message whose scheduled scheduling expires:

advantage:

The number of levels is fixed, each level has its own timer, and the overhead is not large
Putting messages with the same Level into the same Queue ensures the order of messages at the same Level; putting different Levels into different Queues ensures the time accuracy of delivery;
By only supporting a fixed Level, the sorting of different delayed messages becomes an additional write operation of a fixed Level Topic

shortcoming:

The modification cost of the Level configuration is too high, and the fixed Level is inflexible
CommitLog will become very large due to the existence of delayed messages

Pulsar

Pulsar supports "any time" delayed messages, but the implementation is different from RocketMQ.

In layman's terms, Pulsar's delayed messages will go directly to the specified topic sent by the client, and then create a time-based priority queue in the off-heap memory to maintain the index information of delayed messages. The one with the shortest delay time will be placed on the head, and the longer the delay time, the later. When carrying out the consumption logic, it is then judged whether there are messages that need to be delivered due. If there are messages, they are taken out of the queue, and the corresponding messages are queried for consumption according to the index of the delayed message.

If the node crashes, Topics on this broker node will be transferred to other available brokers, and the above mentioned priority queue will be rebuilt.

The following is a schematic diagram of the Pulsar delayed message in the Pulsar public account.

At first glance, it seems that this solution is actually very simple, and it can support messages at any time. But this scheme has several major problems

memory overhead: The queue for maintaining the delayed message index is placed in the off-heap memory, and the queue is dimensioned by subscription groups (consumer groups in Kafka). For example, if your topic has N subscription groups, then If your topic uses delayed messages, N queues will be created; and as the number of delayed messages increases and the time span increases, the memory usage of each queue will also increase. (Yes, under this scheme, supporting arbitrary delayed messages may actually make this flaw worse)
Rebuild time overhead of delayed message index queue after failover: For large-scale delayed messages with a long span, the rebuild time may reach the hour level. (Excerpted from the Pulsar official public account article)
Storage overhead : The time span of delayed messages will affect the space reclamation of consumed message data in Pulsar. For example, if your topic requires a one-month-span delayed message, and you send a one-month-delayed message, the underlying storage in your topic will retain the message data for a whole month. Even though 99% of normal messages in the month have been consumed.

For the first and second points above, the community has also designed a solution to add time partitions to the queue. The Broker only loads the queues of the current time slice into memory, and the remaining time slice partitions are persistent disks. Example picture As shown below:

However, at present, there is no corresponding implementation version of this scheme. In actual use, it can be stipulated that only delayed messages with a small time span can be used to reduce the impact of the first two defects. In addition, because the memory does not store the full data of the delayed messages, but only the index, it may take a backlog of millions of delayed messages to have a significant impact on the memory. From this point of view, the official has not yet improved the first two The question is understandable.

As for the third problem, it is estimated that it is relatively difficult to solve. It is necessary to distinguish delayed messages from normal messages at the data storage layer, and store delayed messages separately.

QMQ

QMQ provides delayed/timed messages at any time, and you can specify that messages are delivered at any time within the next two years (configurable).

I put QMQ at the end because I think QMQ is the most reasonable design of delayed message in open source MQ. The core of the design is simply multi-level time wheel + delayed loading + delayed message separate disk storage .

If you are not familiar with the time wheel, you can read this article by the author Looking at the time wheel algorithm design from Kafka

QMQ's delay/timed messages are implemented using a two-layer hash wheel. The first layer is located on the disk, and each hour is a tick (the default is an hour and a tick, which can be adjusted in the configuration according to the actual situation). Each tick will generate a log file (schedule log), because QMQ supports two years. (default support within two years, configuration modification is possible), a maximum of 2 366 24 = 17568 files will be generated (if the maximum delay time that needs to be supported is shorter, fewer files will be generated). second layer is in memory. When the delivery time of the message is coming, the message index of this hour (the index includes the offset and size of the message in the schedule log) will be loaded from the disk file to the hash wheel in the memory. The hash wheel in memory is with 500ms as a scale.

To summarize the design highlights:

The time wheel algorithm is suitable for delayed/timed message scenarios, eliminating the need to sort delayed messages, and the insertion and deletion operations are O(1) time complexity;
Through the multi-level time wheel design, delayed messages with large time span are supported;
Through delayed loading, only the most recently consumed messages will be stored in memory, and longer delayed messages will be stored on disk, which is memory-friendly;
Delayed messages are stored separately (schedule log), which will not affect the space reclamation of normal messages;

Summarize

This article summarizes the common delay message schemes in the industry, and discusses the advantages and disadvantages of each scheme. Hope to inspire readers.

Common Implementation Schemes of Delayed Messages

foreword

Implementation plan

1. A solution based on external storage

Database based (eg MySQL)

Based on RocksDB

Based on Redis

Defects and Improvements of Timed Thread Checking

2. Implementation in open source MQ

RocketMQ

Pulsar

QMQ

Summarize

refer to

Richard_Yi

引用和评论

70k star，取代Postman！这款轻量级API工具，太香了！

大模型时代，后端程序员如何避免被AI卷死？

C++ 中 VS 项目引入公共配置文件

LSM-TREE从入门到入魔：从零开始实现一个高性能键值存储｜得物技术

疯狂推荐！从零开始 Dify 部署全攻略！

Cherry Studio 入门 MCP：为你的大模型插上翅膀

OpenWebUI：一站式 AI 应用构建平台体验

Common Implementation Schemes of Delayed Messages

foreword

Implementation plan

1. A solution based on external storage

Database based (eg MySQL)

Based on RocksDB

Based on Redis

Defects and Improvements of Timed Thread Checking

2. Implementation in open source MQ

RocketMQ

Pulsar

QMQ

Summarize

refer to

Richard_Yi

引用和评论

70k star，取代Postman！这款轻量级API工具，太香了！

大模型时代，后端程序员如何避免被AI卷死？

C++ 中 VS 项目引入公共配置文件

LSM-TREE从入门到入魔：从零开始实现一个高性能键值存储 ｜ 得物技术

疯狂推荐！从零开始 Dify 部署全攻略！

Cherry Studio 入门 MCP：为你的大模型插上翅膀

OpenWebUI：一站式 AI 应用构建平台体验

LSM-TREE从入门到入魔：从零开始实现一个高性能键值存储｜得物技术