Look at the time wheel algorithm design from Kafka

Original is not easy, please indicate the source for reprinting

Preface

There are many delay operations in Kafka. For example, time-consuming network requests (such as waiting for ISR copy replication to succeed when Produce) will be encapsulated into DelayOperation for delayed processing operations to prevent blocking Kafka request processing threads.

Kafka does not use the Timer and DelayQueue implementations that come with the JDK. Because both insert and delete operations are O(logn) in terms of time complexity, they cannot meet the high performance requirements of Kafka.

Cold knowledge: JDK Timer and DelayQueue are both priority queues at the bottom, that is, the minHeap data structure is adopted. The fastest task that needs to be executed is ranked first in the queue. The difference is that there is a thread in the Timer to pull the task to execute, DelayQueue In fact, it is a container and needs to work with other threads. ScheduledThreadPoolExecutor is a way to implement JDK timing tasks, in fact, it is an implementation of DelayQueue + pooled threads.

Kafka implements delay operations based on the time wheel. The insertion and deletion operations of the time wheel algorithm are all O(1) time complexity, which meets Kafka's performance requirements. In addition to Kafka, open source projects like Netty, ZooKeepr, and Dubbo all have implementations that use the time wheel.

So what is the time wheel algorithm and what is the algorithm idea? How is it implemented in Kafka.

Kafka time wheel algorithm

The algorithmic idea of the time wheel can be understood through the clocks in our daily lives.

The TimingWheel in Kafka is a circular queue that stores timing tasks. The bottom layer is implemented by an array. Each element in the array can store a timing task list (TimerTaskList). TimerTaskList is a circular doubly linked list, each item in the linked list represents a timer task item (TimerTaskEntry), which encapsulates the real timer task (TimerTask).

Several parameters in the figure:

tickMs: time span
wheelSize: the number of buckets in the time wheel
startMs: start time
interval: the overall time span of the time wheel = tickMs * wheelSize
currentTime: an integer multiple of tickMs, representing the current time of the time wheel
- currentTime can divide the entire time wheel into an expiration part and an unexpired part. The time grid currently pointed to by currentTime also belongs to the expiration part, which means that it just expires and needs to process all tasks in the TimerTaskList corresponding to this time grid.

The overall span of the entire time wheel is unchanged. With the continuous advancement of the pointer currentTime, the time period that the current time wheel can handle is also constantly shifting backward, and the overall time range is between currentTime and currentTime+interval.

Now you may have questions, how to advance this abstract currentTime, don’t worry about reading the following

So how to support long-span timing tasks?

If you want to support timing tasks of hundreds of thousands of milliseconds, is it possible to expand the array of time wheels? In fact, there are two solutions:

Use the concept of increasing rounds/laps (Netty's HashedWheelTimer)
- For example, for example, there are currently 8 slots "0-7", 41% 8 + 1 = 2, that is, it should be placed in the position where the slot is 2 and the subscript is 1. Then (41-1) / 8 = 5, that is, the number of rounds is recorded as 5. That is to say, when the slot of subscript 1 is scanned after 5 cycles, this task will be triggered.
- The specific implementation details are not detailed here
Use the concept of a multi-layer time wheel (Kafka's TimingWheel)
- Compared with the previous solution, the hierarchical time wheel can better control the time granularity, can deal with more complex timing task processing scenarios, and have a wider range of applications;

The multi-layered time wheel is more like our clock concept. One circle of the second hand, one circle of the minute hand, and one circle of the hour hand form a multi-layered time wheel.

The Nth time wheel goes one circle, which is equal to the N+1 time wheel goes one square. That is, the time span of the higher time wheel is equal to the overall span of the current time wheel.

During task insertion, if the first level of time wheel does not meet the conditions, try to insert to the higher level of time wheel, and so on.

As time progresses, there will also be a downgrade operation of the time wheel. The tasks with a long delay will be resubmitted from the higher time wheel to the time wheel, and then they will be placed in the appropriate lower time wheel to wait. deal with;

related to the time wheels in Kafka, and how to show this higher level of time wheel relationship?

In fact, it is very simple as a pointer to an internal object, pointing to a time wheel object one level higher than itself.

Another problem is how to advance the time wheel so that the time of the time wheel moves forward.

The time wheel in Netty is advanced by the worker thread at a fixed time interval tickDuration
- If the task is not due for a long time, this scheme will cause the problem of air propulsion, which will cause a certain performance loss;
Kafka is advanced through DelayQueue, which is an idea of space for time;
- All TimerTaskList objects are stored in DelayQueue, sorted according to time, so that tasks with a smaller delay are ranked in the front.
- Externally acquired by a thread (called the ExpiredOperationReaper) DelayQueue timeout from the list of tasks TimerTaskList, then according to TimerTaskList of to propel the precise time time round expiration time , this problem would not exist empty propelled friends.

In fact, Kafka adopts a trade-off strategy, using DelayQueue in the right place. DelayQueue only stores TimerTaskList, not all TimerTasks, and the number is not large. Compared with the impact of empty propulsion, the advantages are greater than the disadvantages of .

Summarize

Kafka uses a time wheel to implement a delay queue, because the bottom layer is that the addition and deletion of tasks is based on a linked list, which is O(1) time complexity and meets the requirements of high performance;
For delayed tasks with a large time span, Kafka introduces a hierarchical time wheel, which can better control the time granularity and can deal with more complex timing task processing scenarios;
For how to realize the advancement of the time wheel and avoid the impact of air advancement on performance, Kafka adopts the idea of space for time, and advances the time wheel through DelayQueue, which is regarded as a classic trade off.

This article uses Kafka to describe the algorithm design ideas of the time wheel. It also mentions the implementation of the Netty time wheel algorithm. It may be more theoretical. It is recommended to read the source code of the Kafka and Netty time wheel implementation. It is not particularly difficult. Compare It will be more rewarding to see.

refer to

"In-depth understanding of Kafka"
"Analysis of Netty Core Principles and RPC Practice" column

Look at the time wheel algorithm design from Kafka

Preface

Kafka time wheel algorithm

Summarize

refer to

Richard_Yi

引用和评论

延时消息常见实现方案

C++ 中 VS 项目引入公共配置文件

疯狂推荐！从零开始 Dify 部署全攻略！

Cherry Studio 入门 MCP：为你的大模型插上翅膀

狂揽17k star！Docker可视化神器，一键部署项目真香！

OpenWebUI：一站式 AI 应用构建平台体验

Spring 数据校验：@Validated 与@Valid 注解全面对比与应用