Common Problems and Solutions of Dewu Technology Message Middleware Application

1 Introduction

Message Queue (MQ) middleware has been popular for many years. In Internet applications, we can all see MQ in applications that are usually larger. There are many middleware middleware on the market, including but not limited to RabbitMQ, RocketMQ, ActiveMQ, Kafka (stream processing middleware), etc. Many developers have mastered the use of one or more message middleware. However, there are still some friends who are not particularly familiar with the message middleware. For various reasons, they cannot learn and understand the principles and details in depth, which may cause problems of various kinds when using it. Here, we analyze the typical problems in the use of message queue middleware (including sequential messages, reliability guarantees, message idempotence, delayed messages, etc.), and provide some solutions.

2. Message middleware application background

2.1 Basic idea of message middleware

In a single system, some business processes can be performed sequentially. When it comes to cross-system (and sometimes within the system), there will be a need for more complex data interaction (which can also be understood as message transmission). The interactive transmission of these data can be synchronous or asynchronous. In the case of asynchronous data transfer, a carrier is often needed to temporarily store and distribute messages. On this basis, professional applications specially designed and developed for message reception, storage and forwarding can be understood as message queue middleware.

By extension: if we simply use a database table to record data, then accept the data to be stored in the data table, and distribute the data in the data table through timed tasks, then we have implemented the simplest message system ( This is the local message table).

We can think that the basic idea of message middleware is to use efficient and reliable message delivery mechanism for asynchronous data transmission. Under the guidance of this basic idea, different messages have different functions, performances, and overall design concepts because of their different focus on scenarios and purposes.

The message queue (MQ) itself implements a one-way communication model from producers to consumers. Commonly used MQs such as RabbitMQ, RocketMQ, and Kafka refer to message middleware that implements this model. Currently, the most commonly used message middleware are RabbitMQ, RocketMQ, Kafka (distributed stream processing platform), and Pulsar (distributed message stream platform). Here I have included two stream processing platforms, and some other message middleware earlier have slowly faded out of sight. When selecting business models, we follow two main principles: the principle of maximum familiarity (ease of operation and maintenance, and reliable use), and the principle of business fit (the performance of middleware can support business volume and meet business functional requirements).

The selection comparison of these commonly used message middleware is easy to find, so I will not describe them in detail here. Roughly speaking: Pulsar is currently not used as much as RabbitMQ, RocketMQ, and Kafka. RabbitMQ mainly focuses on high-reliability messages, RocketMQ pays equal attention to performance and functions, and Kafka is mainly used in big data processing (Pulsar is similar).

2.2 The significance of introducing message middleware

Let's briefly introduce the meaning and value of asynchrony, decoupling, and peak clipping with a simple example (refer to the flow chart below):

For a user registration interface, it is assumed that there are two business points, namely registration and distribution of newcomer benefits, each takes 50ms to process the logic. If we couple these two business processes in one interface, it will take 100ms in total to complete the processing. However, in this process, when users register, they do not need to care about whether their benefits are issued immediately, as long as the registration is successful and the data is returned as soon as possible, and the follow-up newcomer benefits can be handled outside the main process. If we strip it out, only the login logic is processed in the main interface of the interface, and a message is pushed through MQ, and the subsequent distribution of newcomer welfare logic is processed asynchronously, so that the registration interface can be guaranteed to obtain the result in about 50ms. The business of issuing newcomer benefits is slowly processed through asynchronous tasks. By splitting business points, we have achieved decoupling, and adding or reducing function points in the registered subsidiary business will not affect the main process. In addition, if a business main process requests high concurrency at a certain point, just by asynchronous mode, the pressure can be distributed to a longer time period to achieve the purpose of reducing the processing pressure in a fixed period of time, which is traffic peak shaving.

**In addition, languages with a single-threaded model usually have a stronger need for message middleware. A language with a multi-threaded model, or a coroutine language, can achieve asynchronous processing within the business through its own multi-threading (or coroutine) mechanism, but considering the persistence problem and management difficulty, mature middleware is better. It is suitable for asynchronous data communication, and middleware can also realize asynchronous data communication between distributed systems.

2.3 Application scenarios of message middleware

The application scenarios of message middleware mainly include:

- Asynchronous communication: It can be used for asynchronous communication within the business system, and it can also be used for information exchange in distributed systems
- System decoupling: isolate and segment businesses of different natures to improve performance, layer the main and auxiliary processes, isolate them according to their importance, and reduce abnormal impacts
- Flow peak shaving: Disperse the intermittent spurt flow, reduce system pressure and improve system availability
- Distributed transaction consistency: The transaction message function provided by RocketMQ can handle distributed transaction consistency (such as e-commerce order scenarios). Of course, distributed transaction middleware can also be used.
- Send and receive messages in sequence: This is the most basic function, first-in, first-out, and message queues are necessary
- Delayed message: delay-triggered business scenarios, such as delayed cancellation of unpaid orders after placing an order, etc.
- Big data processing: log processing, kafka
- Distributed cache synchronization: consume MySQL binlog logs for cache synchronization, or push business changes directly to MQ consumption

Therefore, if your business has the scenarios listed above, or similar functional and performance requirements, then quickly introduce message middleware to improve your business performance.

3. A series of problems caused by the introduction of message middleware

Although the introduction of message middleware has the above benefits, there are still many problems when using it. E.g:

The introduction of message middleware increases the complexity of the system, how to use and maintain
What to do if the message fails to be sent (message is lost)
In order to ensure that the message can be sent successfully, what should I do if the message is sent repeatedly (message is repeated)
How to deal with exceptions in message flow in middleware
When consuming a message, what should I do if the consuming process fails, and can the message be retrieved from the middleware again?
If the consumption fails, if it can still be obtained, will there be a failure, and the same message will be consumed repeatedly, so the process will be stuck
If the consumption fails, if it cannot be obtained again, how do we ensure that this message can be processed again
How to deal with repeated consumption of the same message process, will it cause business exceptions?
So how do we ensure that the consumption process is only successfully executed once
For those ordered messages, how should we ensure that the order of sending and consuming is consistent
There are too many messages, how to ensure the consumption speed of the consumption script, so as to better meet the processing requirements of the business and avoid the infinite backlog of messages
The message I want to send will wait a few seconds before consuming it, what should I do?

Of course, we can refine the above questions for business developers and get the following key questions:

message ordering guarantee
avoid message loss
Duplication of messages
Message backlog processing
Delay message processing

4. Solution to the problem

4.1 Message Ordering Guarantee

Conventional message middleware and stream processing middleware are generally designed to support sequential messages, but according to the different design goals of the middleware itself, there are different principles and architectures, which leads to the use of middleware in our business. different treatments.

Sequential message design of the following common message or flow middleware and analysis of out-of-order problems in use:

RabbitMQ:

The single queue (queue) of RabbitMQ itself can guarantee the first-in-first-out of messages. In design, the single queue data provided by RabbitMQ is stored on a single broker node. When the mirrored queue is turned on, the mirrored queue is only As a copy of the message, the service is still provided by the main queue. In this case, consumption on a single queue is naturally sequential. However, since a single queue supports simultaneous consumption by multiple consumers, when we enable multiple consumers to consume data on the unified queue, the messages are distributed to multiple consumers. When the concurrency is high, multiple consumers cannot guarantee the processing of messages. sequentiality.

The solution is to use the same MQ queue for messages that need to be ordered in order, and only open one consumer for a single queue (to ensure the ordering of concurrent processing, the same is true for multiple threads). As a result, the problem of single queue throughput drop caused by this, can adopt the design idea of kafka, open a group of multiple queues for a single task, route the messages that need to be ordered according to their fixed identifiers (for example: ID), and distribute them to this group In the queue, messages with the same identifier enter the same queue, and a single queue consumes a single consumer, so that the order and throughput of messages can be guaranteed.

as the picture shows:

Kafka:

Kafka is a stream processing middleware. In its design, there is no concept of queues. The sending and receiving of messages depends on topics. A single topic can have multiple partitions (partitions). These partitions can be distributed to multiple broker nodes, and partitions can also be Set up replica backup to ensure its high availability.

Kafka can have multiple consumers or even consumer groups on the same topic. Message consumption in Kafka generally uses consumer groups (consumer groups can consume messages under the same topic without interfering with each other) for consumption, and there can be multiple consumers in a consumer group. When the same consumer group consumes multiple partitions under a single topic, Kafka will adjust the consumption progress and balance of consumers and partitions in the consumer group. But one thing is guaranteed: a single partition can only be consumed by one consumer in the same consumer group.

Under the above design concept, Kafka internally guarantees that the messages in the same partition are sequential, and does not guarantee the order of messages under the topic. When Kafka's message producer sends a message, it can choose which partition to send the message to. We only need to send the messages that need to be processed sequentially to the same partition under the topic to ensure the order of message consumption. (Multi-threaded languages use a single consumer. When multi-threaded processing data, you need to ensure the order of processing, which is skipped here).

RocketMQ:

Some basic concepts and principles of RocketMQ can be learned from the official website of Alibaba Cloud: What is the RocketMQ version of the message queue? - Message queue RocketMQ version - Alibaba Cloud .

RocketMQ's message sending and receiving is also based on Topics. There are multiple Queues under Topics, which are distributed on one or more Brokers to ensure high-performance sending and receiving of messages (similar to Kafka's Topic-Partition mechanism, but the internal implementation principle is not same).

RocketMQ supports local sequential message consumption, that is, to ensure sequential consumption of messages on the same message queue. Global sequential message consumption is not supported. If you want to achieve global sequential message consumption for a topic, you can set the number of queues for the topic to 1, sacrificing high availability. For specific diagrams, please refer to Alibaba Cloud Documentation: Sequential Message 2.0 - Message Queue RocketMQ Version - Alibaba Cloud

4.2 Avoid message loss

The message loss needs to be divided into three parts: the process of sending a message to the message middleware by the message producer does not occur message loss, the message is not lost in the process of receiving and storing the message in the message middleware to being consumed, and the message is consumed in the process. It is guaranteed that the messages sent by the middleware can be consumed without loss.

Producers send messages without loss:

Message middleware generally has a message sending confirmation mechanism (ACK). For the client, as long as ACK confirmation is required for message sending, it can be judged whether the message is successfully sent to the middleware according to the returned result. This step is usually related to the design of the middleware's message acceptance storage process. According to the design of middleware, the measures we usually take are as follows:

Turn on MQ's ACK (or confirm) mechanism to directly know the result of message sending
Enable the persistence mechanism of the message queue (drop disk, if special settings are required)
The middleware itself is deployed for high availability
Compensation design for message sending failure (retry, etc.)

In the specific business design, if the message sending fails, we can make corresponding compensation according to the importance of the business, for example:

Message failure retry mechanism (sending failure, continue to re-send, you can set the retry upper limit)
If it still fails, according to the importance of the message, choose a downgrade solution: directly discard or downgrade to other middleware or carriers (at the same time, corresponding downgrade compensation push or consumption design is required)

Message middleware messages are not lost:

The message reception and storage mechanisms of the digital message middleware are different, but they will be designed according to their characteristics to ensure that messages will not be lost to the greatest extent:

RabbitMQ message reception and storage:

RabbitMQ message sending can enable the sender confirm mode, and the sender will be notified whether all messages are sent successfully
You need to enable queue message persistence to ensure that messages are placed on the disk
RabbitMQ uses mirror queues to ensure high availability of message queues, but mirror queues are only served by Master, and other slaves only provide backup services.
When the master is down, one of the slaves will be selected to become the new master to provide services
The latest status of the master's production and consumption will be broadcast to the slave

RocketMQ message acceptance and storage:

There are three ways to send ordinary messages in RocketMQ: synchronous (Sync), asynchronous (Async) and one-way (Oneway) transmission. For the difference and accuracy assurance, please refer to Sending ordinary messages (three ways) - Message Queue RocketMQ Edition - Alibaba cloud .
The specific HA mechanism designed by RocketMQ is the master-slave synchronization mechanism. After the message is sent to the Master Broker under the topic and the specific message queue, the message will be synchronized to the slave.
Only the Master Broker can receive messages sent by producers. Consumers can pull and consume messages from Master or Slave.

The design of Kafka when the message is received and saved is as follows:

The design of the partition copy mode ensures high availability of messages, and the number of partition copies can be set when creating a topic
Producers can choose to receive different types of acknowledgments (ACKs), such as when the message is fully committed (written to all in-sync replicas), or when the message is written to the leader's replica, or when the message is sent to the network confirm
Kafka's messages are only stored in the memory of certain partition replica file systems when they are written to the partition, and are not directly flushed to the disk. Therefore, a single replica may still lose data during downtime. Kafka cannot guarantee that the data of a single partition copy will not be lost, but relies on the partition copy mechanism to ensure the integrity of the message (distributed to different brokers)

Backlog message saving aging problem

Kafka has two types of message storage upper limit rules for data under topic: capacity upper limit and time upper limit. If any of these rules are triggered, the messages before they are eliminated will be deleted. This one needs special attention.
In RocketMQ, there is also an upper limit on the storage time of messages on the server, and messages that reach the upper limit will be deleted. Corresponding considerations also need to be made.
Affected by the capacity of the persistent disk, the storage backlog cannot exceed the upper limit of the disk
If business consumption is abnormal, it is necessary to provide sufficient redundancy to avoid data loss due to untimely consumption.

Consumers consume messages without losing:

When the message is consumed, the corresponding ACK mechanism should also be turned on. The successful message consumption is ACK (for Kafka, it is the offset of the update consumption)
For RocketMQ, which has a message re-consumption design, it is necessary to set the maximum number of consumptions, and the failed messages will be repeatedly consumed.

Message ACK brings two problems

If message consumption fails, if ACK cannot be ACKed, message consumption may be blocked indefinitely at a certain message.
Message failure and re-consumption lead to repeated message consumption

For the problem of infinite blocking, you can refer to the retry mechanism of RocketMQ consumption failure, and make certain design for message retry:

The attribute of the number of retries is designed on the message body. The message that fails to consume increases the number of retries and then re-sends it to the middleware, waiting for the next consumption. This time the consumption is successful and the message is sent back with a direct ACK
After the number of message retries reaches the upper limit, if it still fails, the downgrade scheme is enabled, and the message is stored in the abnormal information persistence carrier such as DB
Manual or scheduled tasks compensate for failed messages

Refer to the next section for the problem of repeated consumption of messages.

4.3 Duplication of messages (consumption idempotence)

When analyzing common middleware, we often find that middleware designers delegate the handling of this problem to middleware users, that is, business developers. It is true that the logic of business consumption processing is much more complicated than that of message producers. The producer only needs to ensure that the message is successfully sent to the middleware, while the consumer needs to handle various complex business logic in the consumption script.

To solve the problem of repeated consumption of messages, the core is to use a unique identifier to mark whether a message has been processed. There are many specific options available, such as:

Use the database auto-incrementing primary key or unique key to ensure that the data will not change repeatedly
Use the intermediate state and the orderliness of state changes to determine whether the business has been processed
Use a log table to record the ID of the message that has been successfully processed. If the newly arrived message ID is already in the log table, then this message will not be processed.
Or the unique identifier of the message, maintain a processing cache in NoSQL such as Redis to determine whether it has been processed
If the consumer business process is relatively long, the developer needs to ensure the transactionality of data processing in the entire business consumption logic.

4.4 Message Backlog Processing

Usually, when we introduce message middleware, we have already evaluated and tested the production and consumption rates of message consumption, trying to balance it as much as possible. However, there are also some unpredictable emergencies in the business, which may cause a large backlog of messages. At this time, we can take the following ways to deal with it:

Temporary emergency expansion

By increasing the consumption script, the consumption rate can be increased. If there is no downstream restriction, the message backlog can be quickly reduced
If the downstream data processing capacity of consumers is limited, we can consider establishing a temporary queue, and quickly transfer messages to the temporary queue through a temporary script, giving priority to ensuring the smooth flow of online business, and then opening more consumption scripts to process the backlog of data. (sequential messages require additional processing, and the order of final processing is guaranteed)
Optimize the processing speed of consumption scripts and break through downstream restrictions. If possible, consider batch processing and downstream capacity expansion.

Message Backlog Prevention

Do a good job in business design and downgrade to avoid generating invalid messages and occupying resources
According to the degree of message backlog, dynamically increase or decrease the number of consumers to reduce message backlog
Prepare emergency plans for dealing with backlogs of messages. Abnormal situations can be quickly dealt with according to the plan design.

4.5 Delayed message processing

The function of delaying messages is implemented in some MQ middleware. Delayed messages and timed messages can actually be converted to each other.

RocketMQ: \
RocketMQ timing messages do not support arbitrary time precision (for performance reasons). Only certain levels of delayed messages are supported. The message delay level is configured on the broker side through messageDelayLevel. It internally creates a corresponding message consumption queue for each delay level, and then creates a timing task corresponding to the delay level, pulls the message from the message consumption queue and restores the original topic of the message and the original message consumption queue.

RabbitMQ:

RabbitMQ usually has two solutions for implementing delayed messages: one is to create a message delay dead letter queue, and use a dead letter forwarding queue to achieve consumption delay. However, in this method, if the previous message does not reach the TTL time, the latter message will not be forwarded to the forwarding queue even if it does. in the corresponding queue and consumed.

Kafka itself does not support delayed messages or timed messages. If you want to achieve message delay, you need to use other solutions.

Implement delayed messages with the help of databases and scheduled tasks:

The index structure of common databases supports sequential indexing of data. With the help of the database, it is very convenient to realize the delayed consumption of messages at any time. Use a table to store the consumption time of data, start a scheduled task, extract the message after the conditions are met, and then forward it to the sequential queue for processing or direct processing (the processed needs to be marked, and it will not appear later), but Direct processing requires consideration of issues such as throughput and concurrency repeatability. It is not as convenient as forwarding a single script to a normal queue for processing. The backlog of scheduled task messages supported by the database is controllable, but the throughput will be limited.

Implement delayed messages with the help of an ordered list of Reids:

The ordered list zset structure of Reids can implement delayed messages. Add the message to the zset with the consumption time of the message as the score. Consume messages with the zrangebyscore command

#Command format zrangebysocre key min max withscores limit 0 1 Consume the earliest message # min max represents the starting score and ending score interval, respectively, using 0 and the current timestamp, you can find out the message that has reached the consumption time # withscores Indicates that the queried data should have a score. After limit is the starting offset and quantity of the query zrangebyscore key 0 {current timestamp} withscores limit 0 1

Of course, this solution also has limitations. First, redis must be configured with persistence to prevent message loss (if the configuration is unreasonable, it cannot be 100% guaranteed, but persistence of each command will cause performance degradation, which needs to be weighed); secondly, if the message is delayed Too much will cause the backlog of messages to form a big key; again, you need to balance the repeated consumption and consumption failures by yourself (of course, it is possible, it is recommended to start a single consumption process to transfer the delayed messages to the ordinary queue for consumption).

Task scheduling based on time wheel:

In many software, there are implementations of timed tasks based on time wheels, and delayed task scheduling can be realized by using time wheels and multi-level time wheels. If we want to implement the delayed task queue by ourselves, we can consider using this algorithm to achieve task scheduling, but we need to design the upper limit of the delay to support the task and the time granularity of scheduling (multi-level) according to specific needs. I will not explain the time wheel algorithm here, and if you are interested, you can search for it yourself.

5. Summary

Through the introduction of the above sections, I believe that you can naturally understand the function and core idea of asynchronous decoupling of message queues, and have a certain understanding of how to use MQ to structure your own business. Most of the problems in the use of MQ just require us to think more and consider the details carefully to ensure the high availability of the business. Even, we can extract some cores from these solutions, so that we can optimize our business by referring to similar ideas in the business. For example, the core of message sequence guarantee is that message producers send messages to a unique partition, and then maintain a single consumer in a fixed partition for sequential consumption; the core of avoiding message loss is the confirmation and degradation mechanism of each step; the core of consumption idempotency is unique The core of message backlog processing is rapid response emergency plan; the core of delayed message is message sorting, and the optimization point is performance improvement.

The scientific method includes induction and deduction. In the process of learning the problem solving plan, the corresponding core ideas are extracted and deduced in use. These summarized knowledge points are applied to the business, and the corresponding problems are handled more conveniently. Transactions to build a highly available business architecture, this is what we most need to do. \

refer to:

Message Queuing for RocketMQ- Help Center- Alibaba Cloud
Ding Wei, Zhou Jifeng. "RocketMQ Technology Insider - RocketMQ Architecture Design and Implementation Principles". Machinery Industry Press
Neha Narkhede, Gwen Shapira, Todd Palino. The Definitive Guide to Kafka. People's Posts and Telecommunications Press

Text/LISUXING

Pay attention to Dewu Technology and be the most fashionable technical person!

Common Problems and Solutions of Dewu Technology Message Middleware Application

1 Introduction

2. Message middleware application background

2.1 Basic idea of message middleware

2.2 The significance of introducing message middleware

2.3 Application scenarios of message middleware

3. A series of problems caused by the introduction of message middleware

4. Solution to the problem

4.1 Message Ordering Guarantee

4.2 Avoid message loss

4.3 Duplication of messages (consumption idempotence)

4.4 Message Backlog Processing

4.5 Delayed message processing

5. Summary

得物技术

引用和评论

社区造数服务接入MCP｜得物技术

Koupleless 2024 年度报告 & 2025 规划展望

RabbitMQ 高级用法：基类封装、幂等性和防堆积全攻略

nats，一种高性能、轻量级的分布式消息系统（纯理论，无代码）

开源之夏经验分享｜Koupleless 社区魏照华：开源精神是场永不停歇的接力

开源之夏经验分享｜Layotto 社区郑浩宁：尝试，就会有收获！

开源之夏经验分享｜MOSN 社区韦鑫：做自己认为很酷的事

Common Problems and Solutions of Dewu Technology Message Middleware Application

1 Introduction

2. Message middleware application background

2.1 Basic idea of message middleware

2.2 The significance of introducing message middleware

2.3 Application scenarios of message middleware

3. A series of problems caused by the introduction of message middleware

4. Solution to the problem

4.1 Message Ordering Guarantee

4.2 Avoid message loss

4.3 Duplication of messages (consumption idempotence)

4.4 Message Backlog Processing

4.5 Delayed message processing

5. Summary

得物技术

引用和评论

社区造数服务接入MCP｜得物技术

Koupleless 2024 年度报告 & 2025 规划展望

RabbitMQ 高级用法：基类封装、幂等性和防堆积全攻略

nats，一种高性能、轻量级的分布式消息系统（纯理论，无代码）

开源之夏经验分享｜Koupleless 社区魏照华：开源精神是场永不停歇的接力

开源之夏经验分享｜Layotto 社区 郑浩宁：尝试，就会有收获！

开源之夏经验分享｜MOSN 社区韦鑫：做自己认为很酷的事

开源之夏经验分享｜Layotto 社区郑浩宁：尝试，就会有收获！