Blog recommendation | Apache Pulsar delayed message delivery analysis - ApachePulsar

The author of this article, Zhang Chao, originally published on the public account "Tencent Cloud Middleware", has been authorized to reprint, and has been fine-tuned from the original.

About Apache Pulsar
Apache Pulsar is the top-level project of the Apache Software Foundation. It is the next-generation cloud-native distributed message flow platform. It integrates messaging, storage, and lightweight functional computing. It uses a separate architecture design for computing and storage to support multi-tenancy, persistent storage, Multi-computer room and cross-regional data replication, with strong consistency, high throughput, low latency and high scalability and other streaming data storage characteristics.
GitHub address: http://github.com/apache/pulsar/

Apache Pulsar is a multi-tenant, high-performance message transmission solution between services that supports multi-tenancy, low latency, separation of reads and writes, cross-regional replication, rapid capacity expansion, and flexible fault tolerance. The MQ team of Tencent Data Platform Department conducted in-depth research on Pulsar and optimized a lot of performance and stability. It has been launched on Tencent Cloud Message Queuing TDMQ. This article mainly introduces the implementation of Pulsar's delayed message delivery, and I hope to communicate with you.

What is delayed message delivery

Delayed message delivery is very common in MQ application scenarios. It means that the message will not be delivered immediately after being sent to the MQ server, but will be delivered to the consumer after a fixed time delay based on the attributes in the message, generally divided into timed messages And delayed messages:

Timed message: The Producer sends the message to the MQ server, but does not expect the message to be delivered immediately, but is postponed to a certain time after the current time point to deliver it to the Consumer for consumption.
Delayed message: The Producer sends the message to the MQ server, but does not expect the message to be delivered immediately, but is delivered to the Consumer for consumption after a certain delay.

Currently in the industry, Tencent Cloud’s CMQ and Alibaba Cloud’s RocketMQ also support delayed message delivery:

CMQ: The message delay period is defined as "flight status", the delay range can be configured by setting DelaySeconds, the value range is 0-3600 seconds, that is, the maximum invisible time of the message is 1 hour.
RocketMQ: The open source version delay messages are temporarily stored in an internal topic, supporting specific levels, such as timing 5s, 10s, 1m, etc., and the commercial version supports arbitrary time precision.

The open source NSQ, RabbitMQ, ActiveMQ and Pulsar also have built-in processing capabilities for delayed messages. Although the use and implementation of each MQ project are different, the core implementation ideas are the same: Producer sends a delayed message to a topic, Broker puts the delayed message in temporary storage for temporary storage, delayed tracking service (Delayed Tracker Service) will check whether the message expires and deliver the expired message.

Use scenarios for delayed message delivery

Delayed message delivery is to suspend the processing of the current message and trigger the delivery at a certain point in the future. There are many practical application scenarios, such as anomaly detection retry, order timeout cancellation, appointment reminder, etc.

If the service request is abnormal, you need to put the abnormal request in a separate queue and try again after 5 minutes;
The user purchases a product, but has been in a state of unpaid, and needs to remind the user to pay regularly, and the order will be closed when the time expires;
For interview or meeting appointment, half an hour before the start of the interview or meeting, send a notification to remind again;

Recently, there is a case of Pulsar delayed message in the business product: the business needs to correlate the log messages of the two systems. One of the systems may time out or fail due to querying Hbase, and the failed correlation task needs to be re-connected when the cluster is idle. Scheduling.

How to use Pulsar to delay message delivery

Pulsar first introduced the feature of delayed message delivery in 2.4.0. Using delayed message in Pulsar, you can precisely specify the time of delayed delivery. There are two ways: deliverAfter and deliverAt. Among them, deliverAt can specify a specific timestamp; deliverAfter can specify how long it will execute after the current time. The essence of the two methods is the same, the Client will calculate the timestamp and send it to the Broker.

deliverAfter send

producer.newMessage()
        .deliverAfter(long time, TimeUnit unit)
        .send();

deliverAt send

producer.newMessage()
        .deliverAt(long timestamp)
        .send();

In Pulsar, you can support delayed messages with a large span, such as one month or six months; at the same time, in a topic, both delayed messages and non-delayed messages are supported. The following figure shows the specific process of delaying messages in Pulsar:

The m1/m3/m4/m5 sent by the producer have different delay times, and m2 is a normal message that does not need to be delayed, and the consumer will ack according to different delay times when consuming.

Pulsar delayed message delivery implementation principle

It can be seen from the above usage that Pulsar supports delayed message delivery with second-level precision, which is different from the open source RocketMQ that supports fixed-time level delay.

Pulsar implements delayed message delivery in a relatively simple way. All delayed messages will be recorded by the Delayed Message Tracker with the corresponding index. The index is composed of three parts: timestamp | LedgerID | EntryID, where LedgerID | EntryID is used to locate the message. Timestamp is used to record the time to be delivered and also used to sort the priority queue of the delayed index.

Delayed Message Tracker maintains a delayed index priority queue in off-heap memory, and sorts the heap according to the delay time. The shortest delay time will be placed on the head, and the longer the time, the later. When the consumer is consuming, it will first go to the Delayed Message Tracker to check whether there is an expired message that needs to be delivered. If there is an expired message, the corresponding index is taken from the tracker and the corresponding message is found for consumption; if it is not expired If the message is displayed, the normal message will be consumed directly.

If the cluster has Broker down or topic ownership transfer, Pulsar will rebuild the delayed index queue to ensure that the delayed message can work normally.

Challenges of Pulsar's delayed message delivery

From the implementation principle of Pulsar's delayed message delivery, it can be seen that this method is simple and efficient, is less intrusive to the Pulsar kernel, and can support delayed messages up to any time. But at the same time, it was found that Pulsar's implementation scheme cannot support large-scale use of delayed messages, mainly due to the following two reasons:

delayed index queue is limited by memory

The delayed index of a delayed message consists of three longs. For small-scale delayed messages, the memory overhead is not large. However, because the index queue is at the subscription level, for the same partition of the topic, as many index queues as there are subscriptions need to be maintained; at the same time, because the more delayed messages, the longer the delay, the more memory occupied by the index queue many.

delayed index Queue reconstruction time overhead

As mentioned above, Pulsar will rebuild the delayed index queue if the Broker goes down or the ownership transfer of the topic occurs in the cluster. For large-scale delayed messages that span a long time, the reconstruction time may reach the hour level. In order to reduce the reconstruction time of the delayed index queue, although more partitions can be allocated to the topic to improve the concurrency of reconstruction, it does not completely solve the problem of reconstruction time overhead.

Pulsar delayed message delivery future work

Pulsar's current delayed message delivery solution is simple and efficient, but there are still risks when dealing with large-scale delayed messages. Regarding delayed message delivery, the Pulsar community and the MQ team of the Tencent Data Platform Department will focus on supporting large-scale delayed messages in the next step. The solution currently discussed is to add a time partition to the delayed index queue. Broker only loads the current time slice delayed index into the memory, and the remaining time slice partitions to the persistent disk. The example diagram is shown in the following figure:

In the above figure, we partition the delayed index queue at 5-minute intervals. M5 and m1 are placed in time partition 1. Since the delay time is the closest, they are placed in memory; m4 and m3 are in time partition 2, and the delay time is relatively late , The index is stored on the disk. This solution can not only reduce the time overhead of the delayed index queue reconstruction, but also reduce the dependence on memory.

Concluding remarks

This article introduces the related concepts and usage scenarios of delayed message delivery, and expands the implementation principle of Apache Pulsar in detail. Pulsar's current solution is simple and efficient, and supports delayed message delivery with second-level accuracy, but there are some limitations when dealing with large-scale delayed messages. The Pulsar community and the MQ team of the Tencent Data Platform Department will also focus on supporting large-scale delayed messages in the next step.

Blog recommendation | Apache Pulsar delayed message delivery analysis

About Apache Pulsar

What is delayed message delivery

Use scenarios for delayed message delivery

How to use Pulsar to delay message delivery

Pulsar delayed message delivery implementation principle

Challenges of Pulsar's delayed message delivery

Pulsar delayed message delivery future work

Concluding remarks

Related Reading

ApachePulsar

引用和评论

深入解析 Apache BookKeeper 系列：第二篇 — 写操作原理

祝贺陈梓立(Tison)当选 2025 年度 Apache 软件基金会董事会

架构设计不合理，如何优化系统结构

深入浅出微服务基础设施：服务架构的演进历史

一键实现 Oracle 数据整库同步至 Apache Doris

得物增长兑换商城的构架演进

深入浅出微服务基础设施: 微服务核心组件