3
头图

1. Background of the program

RocketMQ (hereinafter referred to as MQ) as a message middleware is widely used in transaction management, asynchronous decoupling, peak shaving, data synchronization and other application scenarios. When the business system is released in grayscale, Dubbo and HTTP calls can be implemented in our microservice governance and gateway platform based on the industry's common grayscale method, but none of MQ's existing grayscale solutions can completely solve the isolation and isolation of messages. To solve the problem of switching connection, we have added the extended implementation of the MQ grayscale function to the Luban MQ platform (including root cause analysis, resource management, subscription relationship verification, delay optimization, etc.).

Second, RocketMQ technical characteristics

Why hasn't MQ's grayscale solution been implemented yet? Let's first review several core technical points of RocketMQ.

2.1 A brief description of the storage model

图片

(Figure 2.1 MQ storage model)

CommitLog : Where the message body is actually stored, when we send any business message, it will eventually be stored on the commitLog. When MQ is deployed in a cluster on the Broker (this is also concise, and does not involve the master-slave part), the same business message will only fall on a Broker node in the cluster. The commitLog on this Broker will store all the messages routed to it by the Topic. When the message data volume reaches 1 G, a new commitLog will be regenerated.

Topic : Message topic, which represents a logical collection of a class of messages. Each message belongs to only one topic, and a topic contains multiple messages, which is the basic unit for MQ to send and subscribe messages. It belongs to the first-level message type and focuses on business logic design.

Tag : message tag, secondary message type, each specific message can optionally be accompanied by a Tag to distinguish the message type in the same topic, such as order topic, you can use Tag=tel to distinguish mobile phone orders , use Tag=iot to represent smart devices. When the producer sends a message, it can specify a specific Tag for the message, and the consumer can subscribe to get the interested Tag from the Broker, not all messages (Note: The rigorous pulling process is not all on the Broker side Filtering, it may also be partially filtered on the consumer side, which will not be described here).

Queue : In fact, Topic is more like a logical concept for us to use. From the source code level, Topic is distributed on multiple Brokers in the form of Queue, and a topic often contains multiple Queues (Note: Topic of global order messages only has A Queue, so the global order can be guaranteed), there is a mapping relationship between Queue and commitLog. It can be understood as the index of the message, and the message can only be found by specifying a specific Queue of the Topic. (Note: Students who are familiar with kafka can compare partitions).

consumer group and its ID : Indicates a type of Producer or Consumer, which usually produces or consumes messages of the same application domain, and the logic of message production and consumption is consistent. Each consumption group can be identified by defining the GroupID of the global dimension, which represents the consumption group. Different consumer groups are isolated from each other during consumption, and will not affect the calculation of each other's consumption sites.

2.2 Message sending and consumption

图片

(Figure 2.2 Message sending and pulling model)

2.2.1 Client ID

In a producer or consumer cluster, each running instance of an MQ client is guaranteed to generate a unique ClientID in the MQ client. Note: In the same application instance, when it acts as both a producer and a consumer, its ClientID is actually the same value.

2.2.2 message sending

When sending a message to a topic, the metadata of the topic will be obtained first. The metadata will include how many queues it has and which broker each queue belongs to. By default, the sender will select a Queue to send the current message, and the algorithm type is polling, that is, the next message will select another Queue to send. In addition, MQ also provides a method of specifying a Queue or customizing a Queue to send messages. To customize a Queue, you need to implement the MessageQueueSelector interface.

2.2.3 Message consumption

When consuming messages, consumers of the same consumer group (GroupID) will subscribe to the Topic, and the consumers will first obtain the metadata of the Topic, that is, they will obtain all the Queue information of the Topic. Then these Queues are assigned to each specific client (ClientID) according to the rules. Each client calculates the corresponding offset of the message that needs to be pulled according to the assigned Queue and generates a PullRequest. After the message is pulled and consumed, the client will generate an ACK And update the consumption progress.

The consumption progress here is the minimum offset that the batch of messages has not been successfully consumed. As shown in Figure 2.3, if 1 and 5 of a batch of messages are not consumed, the rest of the messages have been consumed. At this time, the updated offset is still 1. After the shutdown restarts, the message will be consumed from the 1st, and the messages 2, 3, and 4 will be consumed repeatedly.

图片

(Figure 2.3 Schematic diagram of consumption progress update)

Therefore, RocketMQ only guarantees that the message will not be lost, and cannot guarantee that the message will not be repeatedly consumed. The idempotency of the message needs to be implemented by the business itself.

In addition, the consumer can specify to consume certain Tag messages. When the pullRequest pulls, it will be quickly filtered by the hash value in the broker according to the Queue index information of the storage model, and returned to the consumer, and the consumer will also return to the accurate filtering of messages.

2.3 Consistency of subscription relationship

On the consumer side, each application instance MQ client in the same consumer group (with the same GroupID, the premise described in this section is in the same consumer group) needs to maintain the consistency of the subscription relationship. The so-called consistency of the subscription relationship is the same consumer group. The topics and tags subscribed by all clients in the group must be exactly the same. If the subscription relationship in the group is inconsistent, the logic of message consumption will be chaotic, and even lead to message loss.

2.3.1 Maintenance of subscription relationship

Each application instance MQ client has an independent ClientID. Let's briefly explain the maintenance of the subscription relationship:

  • Each MQ consumer client will send a heartbeat packet to all brokers with its ClientID. The heartbeat packet contains the specific Topic & Tag information subscribed by the client, and the registerConsumer method will store the clients in groups according to the consumption group. The same consumption group The ClientID information inside is in the same ConsumerGroupInfo;
  • After the Broker receives the heartbeat packet sent by any consumer client, it will store different consumer group information in the ConcurrentMapconsumerTable in the ConsumerManager class according to the consumer group name GroupID as the key, and the subscription information of the same consumer group will receive the heartbeat packet every time After that, it will be updated and maintained according to the current subscription Topic & Tag information, which is equivalent to only saving the latest heartbeat package subscription information (the subVersion in the heartbeat package will mark the heartbeat package version, and when the rebalancing result changes, the subVersion will be updated. , the Broker will only save the subscription information in the latest version of the heartbeat package), no matter which ClientID of the consumer group the heartbeat package comes from. (Because Tag is an attribute dependent on Topic, when the subscription relationship between Topic and Tag is inconsistent, the corresponding processing result of Broker is also slightly different, see the updateSubscription method for details).

2.3.2 Subscription inconsistency affects

Here we use examples to illustrate some of the problems caused by inconsistent subscription relationships. Assuming that the consumer clientA in the same group subscribes to TOPIC\_A, and clientB subscribes to TOPIC\_B; TOPIC\_A and TOPIC\_B are both 4 Queues. When the consumption allocation is performed, the final Queue allocation results are as follows:

图片

(Table 2.1 Queue allocation result table of message consumers)

Because clientB does not subscribe to TOPIC\_A, and clientA does not subscribe to TOPIC\_B, the messages in Queue-2 and Queue-3 in TOPIA\_A, Queue-0 and Queue-1 in TOPIC\_B cannot be consumed normally. Moreover, in the actual process, clientA cannot consume the messages of TOPIC_B normally. At this time, many exceptions will be seen on the client side, and some Queue messages will not be consumed and will accumulate.

In addition, in the subscription relationship, when the tags subscribed by different clients are different, the problem that the pulled message does not match the message that needs to be subscribed will also occur.

3. The industry MQ grayscale solution

图片

(Figure 3.1 Schematic diagram of grayscale call)

Generally, business grayscale only strictly guarantees calls between RPC services, and the loss or error of some message grayscale traffic can be tolerated. As shown in Figure 3-1, grayscale messages generated by V\_BFF will be processed by V\_BFF. The normal version and grayscale version of _TRADE are received and randomly consumed, resulting in some grayscale traffic not entering the expected environment, but the overall RPC service call still isolates the grayscale and non-grayscale environments. When the business changes the logic of message consumption, or does not want the grayscale message to affect the online data, the grayscale of MQ must be implemented.

Due to the limitation of the subscription relationship, the current MQ grayscale solutions implemented in the industry are implemented by using different GroupIDs for the normal version and the grayscale version. The following scenarios all use different GroupIDs.

3.1 Scheme of Shadow Topic

Create a new series of topics to handle isolated grayscale messages. For example, TOPIC\_ORDER will create TOPIC\_ORDER_GRAY for use in grayscale environments.

When the sender sends, the grayscale message is written into the shadow topic. When consuming, the grayscale environment only uses the grayscale grouping to subscribe to the grayscale topic.

3.2 Tag scheme

When sending, the sender adds a grayscale mark to the Tag of the message generated by the grayscale environment. The consumer only subscribes to the messages of the grayscale tags each time the grayscale version is released, and the normal version subscribes to the non-grayscale tags.

3.3 Scheme of UserProperty

When sending, the sender adds a grayscale mark to the UserProperty of the message generated by the grayscale environment. The consumer's client needs to be rewritten and filtered according to UserProperty. The normal version will skip such messages, and the grayscale version will process grayscale messages.

3.4 Current program flaws

The advantages of the above three schemes are not compared here, but they all have the following common defects (there are other defects or development demands, but they are not fatal), and it is impossible to truly realize the processing without message loss when the gray state is switched back to the normal state. As a result, the entire grayscale scheme is a process from entry to abandonment:

  • Because different consumer groups are used, how to correctly connect the consumption displacement of the original normal version of the consumer group after the gray-scale version verification is passed, so as to efficiently process without losing information?
  • How to ensure accurate consumption of grayscale messages, so that messages that fall into the grayscale logo can be processed efficiently without losing information?
  • When grayscale is turned on, where does the location of the grayscale message start? How to control the details of the status?

4. Grayscale scheme of Luban MQ platform

In essence, the core of the MQ grayscale problem is to efficiently isolate grayscale and non-grayscale messages, and consumers can accurately obtain the corresponding version of the message according to their own needs; when the grayscale is completed, it can be correctly spliced back. The displacement of the message does not lose the necessary message for processing, that is, the management of the state details. In order to achieve this purpose, the scheme has been modified in the following points.

The code involved in this scheme is the test code, which is mainly used to illustrate the scheme, and the actual code will be processed in more detail.

4.1 Isolated use of Queue

图片

(Figure 4.1 Differential use of Queue)

We already know that Queue is the actual execution unit of topic. Our idea is that uses Queue to realize the distinction between v1 (normal) messages and v2 (grayscale) messages , we can fix the first and last two [configurable] Queue exclusively To send and receive grayscale messages, the rest of the Queue is used to send normal online messages. We use the same consumer group as (that is, it is different from the general solution in the industry, we will use the same GroupID) to allow grayscale consumers to participate in the rebalancing of grayscale Queues, and non-grayscale consumers to participate in non-grayscale Queues weight balance.

Here we solve the storage isolation problem of messages.

4.2 Transformation of Broker subscription relationship

Grayscale version often needs to change Topic or Tag. Since we did not add an independent grayscale consumer group, when the grayscale version changes Topic/Tag, the subscription relationship within the consumer group will be inconsistent. The previous article also briefly explained the consistency of the subscription relationship. According to the principle, we need to make corresponding transformations in Broker to be compatible with the inconsistency between grayscale and non-grayscale subscriptions.

The subscription information of the same consumer group will be maintained in the subscriptionTable of ConsumerGroupInfo. You can add to ConsumerGroupInfo to create a to store the subscription information of the gray version Its own grayscale flag grayFlag , according to the grayscale flag grayFlag , to select whether the subscription relationship is stored in subscriptionTable or graySubscriptionTable; when pulling messages, it also sends grayFlag to the Broker to select whether to obtain the corresponding subscription information from subscriptionTable or graySubscriptionTable.

Here we solve the consumption subscription consistency problem.

4.3 Transformation of Producer

The transformation of the sender is relatively simple. It only needs to determine whether the sent message is a grayscale message, and by implementing the MessageQueueSelector interface, the grayscale message can be delivered to a specified number of grayscale Queues. Here we define the grayQueueSize for grayscale into the configuration center. Currently, it is more agreed to use the Broker's designated Queue number as grayscale.

TOPIC\_V\_ORDER has a total of 6 Queues. As shown in Figure 4.2, grayscale messages will only be sent to the first and last Queues 0 and 5, and non-grayscale messages will select the remaining 4 Queues to send messages.

图片

图片

(Figure 4.2 Sending Results)

Here we solved the problem of the producer delivering correctly.

4.4 Transformation of Consumer

The transformation points involved in the consumer are mainly the rebalancing allocation strategy of grayscale Queue and non-grayscale Queue, and the update and synchronization of each client's grayscale flag grayFlag.

The core of the grayscale rebalancing strategy is the Queue that classifies and processes grayscale and non-grayscale. It is necessary to assign the grayscale Queue to the grayscale ClientID, and assign the non-grayscale Queue to the non-grayscale ClientID. Therefore, in Before rebalancing, the latest grayFlag corresponding to the clientId of all clients in the same group will be obtained through Namesrv (that is, the status will be recorded in Namesrv).

When the grayscale version needs to be changed to the online version, each client will synchronize the grayFlag to Namesrv. At the same time, in order to avoid the grayscale message not being consumed, it will first determine whether there is an unconsumed message in the grayscale Queue before updating the grayFlag. , the grayFlag will be updated only after the consumption of grayscale messages is completed.

Consumers need to use AllocateMessageQueueGray as a rebalancing strategy, pass in the number of grayscale Queues, and setGrayFlag for grayscale consumers is true. It can be seen that only messages from Queues 0 and 5 at the beginning and end are consumed, and the setGrayFlag for non-grayscale consumers is false, it can be seen that only the messages of the middle 4 Queues will be consumed. You can also see the allocation results of the Queues very clearly in the console. The v2 client with grayFlag true is allocated to the first and last Queue, and the v1 client with grayFlag false The terminal is allocated to the 4 Queues in the middle.

图片

图片

图片

(Figure 4.3 Consumption and subscription results)

When the grayscale version needs to be switched to the online version, you only need to call updateClientGrayFlag to update the status. It can be seen that after calling updateClientGrayFlag, the two grayscale clients of the original v2 consume the messages of the grayscale Queue. It really becomes false [the state is saved in namesrv], and it is added to the rebalancing of the 4 non-grayscale Queues in the middle. The original 2 grayscale Queues at the beginning and the end have no consumer subscriptions.

图片

图片

图片

(Figure 4.4 grayFlag update)

Here we solve the problem of detailed control processing of state switching.

4.5 Transformation of Namesrv

As mentioned above, consumers need to obtain the gray flag grayFlag of all clients in the group when rebalancing. Therefore, we need a place to store these grayFlags persistently. This place is accessible to every consumer. We Choose to store this information in Namesrv.

  • Namesrv is relatively lightweight and has good stability;
  • The consumer itself will establish a long connection with the Namesrv. If the namesrv hangs up, the consumer will automatically connect to the next Namesrv until there is an available connection;
  • The Broker is the place where messages are actually stored, and its own operation pressure is relatively large. The synchronization of grayscale data will increase the pressure on the Broker to a certain extent.

However, Namesrv itself is a stateless node, and there is no information synchronization between nodes. The consistency of grayscale data needs to be guaranteed by the database. Namesrv can access the same set of databases together, and the database stores grayscale information persistently. Each time the grayscale state of v1 and v2 is updated, the data in the database is modified through Namesrv, and before each rebalancing, the grayscale state of all instances in the consumer group can be pulled through Namesrv.

图片

(Figure 4.5 Schematic diagram of Namesrv storing grayscale data)

Here we solve the problem of state storage and synchronization.

5. Verification of grayscale scenes

The test is the truth of verifying the feasibility of the scheme. The following is a simple demo to verify the MQ grayscale scheme of the Luban platform.

5.1 Topic & Tag remain unchanged in grayscale version

This scenario has been verified in 4.3 and 4.4 and will not be repeated here.

5.2 Topic added in grayscale version

Assuming that the subscription information of v1 and v2 is shown in Table 5.1, the Topic subscription result is shown in Figure 5.1. TOPIC\_V\_ORDER is subscribed by v1 and v2 at the same time. Queues are assigned to non-grayscale v1 clients; TOPIC\_V\_PAYMENT is only subscribed by the grayscale version v2, so only the first and last two Queues will be assigned to v2 clients, and the remaining four Queues will not be subscribed by clients . We send 4 non-grayscale messages and grayscale messages to TOPIC\_V\_ORDER, respectively, and 4 grayscale messages to TOPIC\_V\_PAYMENT. It can be seen from Figure 5.2 that TOPIC\_V\_ORDER is not in grayscale The degree message is successfully consumed by the two clients of v1, and the grayscale messages of TOPIC\_V\_ORDER and TOPIC\_V\_PAYMENT are successfully consumed by the two clients of v2.

图片

(Table 5.1 Subscription Information Sheet)

图片

(Figure 5.1 Subscription Results)

图片

(Figure 5.2 Consumption Results)

5.3 Topic reduction in grayscale version

Assuming that the subscription information of v1 and v2 is shown in Table 5.2, the Topic subscription result is shown in Figure 5.3. TOPIC\_V\_ORDER is subscribed by v1 and v2 at the same time. Queues are assigned to non-grayscale v1 clients; TOPIC\_V\_PAYMENT is only subscribed by non-grayscale version v1, so only the four middle queues will be allocated to v1 clients, and the first and last two Queues will not be subscribed by clients end subscription. We send 4 non-grayscale messages and grayscale messages to TOPIC\_V\_ORDER respectively, and 4 non-grayscale messages to TOPIC\_V\_PAYMENT. It can be seen from Figure 5.4 that TOPIC\_V\_ORDER and TOPIC\ The non-grayscale messages of _V\_PAYMENT are successfully consumed by two clients of v1, and the grayscale messages in TOPIC\_V\_ORDER are successfully consumed by two clients of v2 .

图片

(Table 5.2 Subscription Information Sheet)

图片

(Figure 5.3 Subscription Results)

图片

(Figure 5.4 Consumption Results)

5.4 Grayscale version Tag changes

Assuming that the subscription information of v1 and v2 is shown in Table 5.3, the Topic subscription result is shown in Figure 5.5. TOPIC\_V\_ORDER is subscribed by v1 and v2 at the same time. Queues are allocated to non-grayscale v1 clients. We send 4 non-grayscale messages with Tag=v1 and greyscale messages with Tag=v2 to TOPIC\_V\_ORDER respectively. As can be seen from Figure 5.6, the tag is The non-grayscale messages of v1 are successfully consumed by the two clients of v1, and the grayscale messages with Tag of v2 are successfully consumed by the two clients of v2 .

图片

(Table 5.3 Subscription Information Sheet)

图片

(Figure 5.5 Subscription Results)

图片

(Figure 5.6 Consumption Results)

5.5 Topic & Tag Mixed Changes in Grayscale Version

Assuming that the subscription information of v1 and v2 is shown in Table 5.4, the Topic subscription result is shown in Figure 5.7, which is the same as the situation in 5.2 and will not be repeated here. We send 4 non-grayscale messages with Tag=v1 and grayscale messages with Tag=v2 to TOPIC\_V\_ORDER, respectively, and 4 grayscale messages to TOPIC\_V\_PAYMENT. The consumption result is shown in Figure 5.8. You can It can be seen that the two clients of v2 successfully consumed the grayscale message of Tag=v2 in TOPIC\_V\_PAYMENT and TOPIC\_V\_ORDER, while the two clients of v1 only consumed the Tag in TOPIC\_V\_ORDER =v1's non-grayscale message .

图片

(Table 5.4 Subscription Information Table)

图片

(Figure 5.7 Subscription Results)

图片

(Figure 5.8 Consumption Results)

6. Conclusion

For the actual grayscale version of MQ, we also uniformly encapsulate the sender and consumer of MQ. The business side only needs to configure graySwitch and grayFlag. GraySwitch marks whether grayscale messages need to be enabled. On the premise that graySwitch is enabled, grayFlag It will take effect and is used to mark whether the current client is a grayscale client.

During multi-system interaction, the business system can control whether to consume the grayscale and non-grayscale messages of other systems in full by switching graySwitch, and control whether to consume grayscale messages or non-grayscale messages alone through grayFlag. The graySwitch and grayFlag parameters can be placed in the configuration center to take effect thermally. When the grayscale traffic needs to be switched, a corresponding script can be developed to uniformly change the grayFlag to realize the lossless switching of the full-link grayscale traffic.

In addition, we use Namesrv to control the switching state in sufficient detail to ensure that before the actual switching is performed, the unconsumed messages will be consumed before the actual switching is performed.

Here, I am also very grateful to Alibaba for open source RocketMQ, a messaging middleware!

Author: vivo process IT team - Ou Erli, Xiong Huanxin

vivo互联网技术
3.3k 声望10.2k 粉丝