Editor’s Recommendation:

The author of the article is China Mobile Cloud Competence Center, and the reprint of the article has been authorized. This article will analyze in detail the background and overall design of the cloud native message queue AMQP based on Apache Pulsar.

The following article comes from the cloud native of Su Yan, the author of the cloud native middleware team


Friendly reminder: The full text is more than 5000 words, and the expected reading time is 3 minutes

1. The background of self-developed message queue middleware AMQP

As the company's public cloud business continues to develop and the number of users continues to grow, the amount of data and requests are also increasing sharply. As an indispensable communication component, message queues are facing increasing pressure. In the maintenance of the message queue components, we found that the message queue RabbitMQ used by OpenStack has some problems, such as easy loss of messages, poor cluster stability, and difficulty in troubleshooting. Many difficult problems have given us the idea of seeking to replace the open source RabbitMQ message queue; on the other hand, mainstream cloud service vendors, such as Alibaba Cloud, Huawei Cloud, Tianyi Cloud, etc., have already provided message queue products supported by the AMQP protocol, and we Lack of self-developed benchmarking products. Based on the above two points, the team began to develop a high-performance, high-reliability, and high-stability message queuing product that supports the AMQP protocol.

Our needs

Integrating the requirements of cloud products and mobile cloud components, the core requirements of message queue AMQP are summarized as follows:

  1. Possess high throughput, low latency, and high reliability capabilities;
  2. Ability to expand and shrink unlimited capacity according to demand;
  3. For cloud products, it needs to have the ability of multi-tenant isolation;
  4. It has the ability to accumulate massive messages, and the accumulation of messages will not affect performance and stability;
  5. Support AMQP protocol, compatible with AMQP protocol client (for example, open source RabbitMQ SDK);
  6. Possess operation and maintenance deployment capabilities, the convenience of data migration, and efficient cluster deployment and migration efficiency.

Open source product research

At the beginning of product design, the R&D team mainly investigated two open source products that implemented the AMQP protocol: RabbitMQ and Qpid. Due to Qpid's poor performance and was excluded, RabbitMQ has high performance and a wide range of applications, but it does not meet our needs and usage scenarios. There are several reasons:

  1. The performance of persistent messages is poor. In order to ensure that the message is not lost and the message can be traced back, the message needs to be stored persistently. After testing, the persistent message performance of RabbitMQ is poor and does not meet our needs. For details, see the performance test comparison in Chapter 4.
  2. In order to ensure that the message is not lost, you can use the mirror queue of RabbitMQ. The mechanism of the mirror queue is that multiple copies of the message are stored in the memory of multiple nodes. After testing, it is found that in the scenario where the mirror queue is turned on, there will be two problems: the first is The cluster stability is poor, followed by the accumulation of messages that will cause memory overflow.
  3. The difficulty of troubleshooting, RabbitMQ's programming language is Erlang, which is relatively niche. The source code of RabbitMQ is obscure, which makes it difficult to troubleshoot when problems occur. RabbitMQ does not have the ability to track messages in real time. You need to enable the plug-in for message tracing, and it is impossible to trace back the problematic messages in time.

Based on the above reasons, we started our self-research on message queue AMQP.

2. Overall design of message queue AMQP

Computing and storage separation architecture

Considering many design goals such as high stability, high reliability, and high throughput, the R&D team finally decided to adopt a separate computing and storage architecture that conforms to the cloud-native design concept.

Computational storage coupling and computational storage separation are compared in the following aspects:

When storage and computing are coupled in a cluster, there are some problems as follows:

  1. In different applications or development periods, different storage space and computing power ratios are required, which makes the selection of the machine more complicated;
  2. When storage space or computing resources are insufficient, you can only expand both at the same time, resulting in low economic efficiency of expansion (another expansion resource is wasted);
  3. In the cloud computing scenario, true elastic computing cannot be achieved, because the computing cluster also has data, and the data will be lost if the idle computing cluster is closed.

In response to some problems caused by the coupling of computing and storage, the R&D team also investigated the open source message queue Apache Pulsar with computing and storage separation capabilities.

In Pulsar's architecture, computing and data storage are two separate components:

The computing layer is the Broker. Pulsar's Broker does not directly store the message entity data. It is mainly responsible for processing the connection request processing related to Consumer and Producer. If there are too many Consumers and Producers in the business, this layer can be expanded separately.

Data storage is Bookie. Pulsar uses Apache BookKeeper as storage support. BookKeeper is a service framework that provides log entry stream storage and persistence. When using the computing layer, you don't need to pay too much attention to storage details.

Combining the application scenarios of message queue AMQP and the customer groups for analysis, we choose Pulsar, a message queue with separate computing and storage, as a prototype for customized development of message queue AMQP. After investigating Pulsar and in-depth communication with the Puslar community, it is found that Pulsar’s existing Protocol Handler mechanism is very suitable for our customized development needs. The Protocol Handler protocol processing plug-in can use some of the existing components of Pulsar (such as the service discovery component-Topic LookUp). , Distributed log library-ManagedLedger, consumption progress management component-Cursor, etc.) to help us realize some logical processing.

Therefore, Pulsar's Protocol Handler has become the basis for the customized development of our message queue AMQP. For more introduction to Protocol Handler, please refer to Pulsar PIP-41.

(https://github.com/apache/pulsar/wiki/PIP-41%3A-Pluggable-Protocol-Handler)

The overall architecture of the message queue AMQP is shown in the figure above. The focus of development lies in the analysis and processing of the communication layer protocol of the AMQP Protocol Handler, the mapping between the AMQP 0-9-1 protocol model and the Pulsar model, multi-tenant support, and the core Send the processing of the consumption process.

Core function point design

  • Message storage

The storage design of message queue AMQP draws on RabbitMQ, and finally realizes the use of Pulsar's PersistentTopic to realize the storage of specific entity data and index data.

1, RabbitMQ's message storage model

The message persistence of RabbitMQ actually includes two parts: the queue index (rabbit_queue_index) and the message store (rabbit_msg_store).

rabbit_queue_index is responsible for maintaining the information of order placement messages in the queue, including the storage location of the message, whether it has been submitted to the consumer, whether it has been ACKed by the consumer, etc. Each queue has a corresponding rabbit_queue_index.

rabbit_msg_store stores messages in the form of key-value pairs, each node has one and only one, and all queues on the node share the file. From a technical perspective, rabbit_msg_store can be divided into msg_store_persistent and msg_store_transient, where msg_store_persistent is responsible for the storage of persistent messages and will not be lost; while msg_store_transient is responsible for the storage of non-persistent messages, and messages will be lost after restart.

2. BookKeeper provides storage support

The ManagedLedger in the Pulsar broker implements the encapsulation of the BookKeeper storage layer. Using ManagedLedger can realize message persistence, reading and consumption progress management.

3. How to use Pulsar's existing model to achieve exchange and queue correspondence

AMQP 0-9-1 introduces some basic concepts, such as Exchagne, Queue and Router. These are quite different from Pulsar's model. Therefore, we need to adopt a modeling mapping method to map the existing Pulsar publish/subscribe model for Topic with the business model in the AMQP communication protocol.

AmqpExchange

AmqpExchange contains an original message Topic, which is used to save the message sent by AMQP Producer. The Replicator of AmqpExchange will process the message to the AMQP queue. Replicator is a persistent cursor based on Pulsar, which can ensure that messages are successfully sent to the queue without losing messages.

AmqpMessageRouter

AmqpMessageRouter is used to maintain message routing types and routing rules for routing messages from AmqpExchange to AmqpQueue. Metadata such as routing types and routing rules are persisted in Pulsar's ManagedLedger. So even if Broker restarts, we can also restore AmqpMessageRouter.

AmqpQueue

AmqpQueue provides an index message Topic, which is used to store IndexMessage routed to this queue. IndexMessage consists of the ID of the original message and the name of the Exchange where the message is stored. When AmqpQueue sends a message to Consumer, AmqpQueue reads the original message data according to IndexMessage, and then sends it to Consumer.

  • Multi-tenant support

As an enterprise-level messaging system, Pulsar's multi-tenant capabilities can meet the following requirements:

  • Ensure isolation between different tenants
  • Enforce quotas for resource utilization
  • Provide per-tenant and system-level security
  • Ensure low-cost operation and maintenance and simple management as possible

The characteristics of Pulsar's multi-tenancy are fully revealed in the topic URL mapping, and the structure is as follows:

In the AMQP 0-9-1 protocol definition, VirtualHost is the basic unit of resource isolation, and cannot have a completely consistent correspondence with this multi-level model of Pulsar. In our implementation, the mobile cloud AMQP message queue introduces Instance The concept corresponds to the tenant in Pulsar, the component version uses a fixed tenant, and the Namespace in Pulsar corresponds to the VirtualHost in AMQP. The corresponding diagram of the component version is as follows:

  • Message flow process

  1. When the Producer sends a message to AmqpExchange, AmqpExchange persists the message to Pulsar Topic (we call it the Topic that stores the original message).
  2. The Replicator of AmqpExchange will pass the message to the Router.
  3. The Router determines whether the message needs to be routed to AmqpQueue. If it is, the ID of the original message will be stored in the Topic of the AmqpQueue (we call it the Topic that stores the indexed message).
  4. AmqpQueue delivers the message to the Consumer.

3. Comparison of message queue AMQP and RabbitMQ

Cloud native

The "native" of cloud native means that the possibility of deployment in the cloud is considered at the beginning of the software design. The message queue AMQP adopts a core architecture of separation of computing and storage, which can make full use of distributed, elastically scalable cloud resources.

Message queues under the traditional architecture, such as RabbitMQ, store messages locally, and the Broker component needs to undertake the dual functions of message distribution and storage; this makes Broker not a stateless service and does not have the ability to scale elastically.

The self-developed message queue AMQP uses a separate computing and storage architecture to further split the role of Broker into Broker for message distribution and Bookie for message storage, thereby turning Broker into a stateless service to achieve Broker's elastic scaling .

Message reliability

The non-loss of messages is usually guaranteed by two aspects: first, the message needs to be persisted to disk, and secondly, the persistent message needs to save multiple copies to improve the fault tolerance of the message queue. Compared with the message queue RabbitMQ, self-developed AMQP has the following advantages in guaranteeing message reliability:

RabbitMQ persistent messages use an asynchronous disk flushing mechanism, which cannot guarantee that data will not be lost under extreme conditions such as power outages and hardware failures; the message queue AMQP natively supports message synchronous flushing, which can ensure that messages are not lost in any extreme failure scenarios except for disk damage. Not lost.

RabbitMQ can synchronize multiple copies of messages only when the mirror queue is turned on; while the message queue AMQP natively supports multiple copies of messages to be saved, and data can be recovered from the copies when some node disks are damaged.

RabbitMQ has poor performance when opening message persistence and mirroring queues, and cannot meet high-performance requirements; while message queue AMQP can still guarantee extremely high performance through sequential file reading and message caching mechanisms.

Fault tolerance

Fault tolerance refers to the availability of the cluster when some nodes in the cluster fail.

RabbitMQ's fault tolerance is not high. In the event of a network partition, it will cause data loss and unavailability of the cluster, especially when mirroring queues are configured.

Each component of the message queue AMQP can be independently fault-tolerant. Broker is a stateless service. When an error occurs, the Queue will be transferred to other Brokers without affecting the sending and receiving of messages. Although Bookie has a state, there is no distinction between master and slave. As long as there are enough copies of the message, the service can still run normally even if part of the Bookie is down or unavailable.

Maintainability

Compared with RabbitMQ, self-developed AMQP provides a more complete operation and maintenance monitoring system. Various indicators of the system, such as TPS, capacity, connection status, consumer status, etc., are monitored in detail, and comprehensive instructions are provided at the same time Documents to deal with various problems that may occur, and facilitate troubleshooting.

RabbitMQ does not have the capability of real-time message tracking, and cannot trace back the messages that have problems in time; while AMQP has the function of message tracking to ensure that all messages can be traced and provide convenience for troubleshooting.

RabbitMQ uses the Erlang language, which is relatively niche and obscure, making it difficult to troubleshoot problems at the source code level; while AMQP is a self-developed middleware, and all problems can be solved through code analysis and troubleshooting.

Four, performance

In the same environment, test the stand-alone performance of message queue AMQP and RabbitMQ:

Message body size: 1KB

Number of exchanges: 1

Number of queues: 1

Whether the message is persistent: Yes

By creating 1 Queue, 1 Exchange, and sending 1KB messages, comparing the performance of message queue AMQP and RabbitMQ, the above line chart is obtained. From the figure above, it can be seen that under the same test conditions, the performance of message queue AMQP is far Higher than RabbitMQ.

Five, summary

At present, the AMQP message queue has been officially launched on the mobile cloud, and everyone is welcome to order and use it.

https://ecloud.10086.cn/home/product-introduction/amqp)

In addition, the replacement of the message queue AMQP component version with RabbitMQ in OpenStack has also completed the verification of the test environment, and it will be used as the communication component of the OpenStack component in the mobile cloud production environment in the future.

Reference

1. Apache Pulsar official website: https://pulsar.apache.org/docs/en/concepts-architecture-overview/

2. "RabbitMQ Practical Guide"

END


About the Author

Zhang Hao

Middleware development engineer of China Mobile Cloud Competence Center, head of message queue AMQP research and development, has rich experience in the field of message middleware and distributed caching.

Wang Shaojie

The software development engineer of China Mobile Cloud Competence Center is mainly responsible for the research and development, performance tuning and maintenance of mobile cloud message queue products. He has rich practical and optimization experience in RocketMQ, Pulsar, RabbitMQ, etc.

Contributions welcome

The Apache Pulsar community welcomes everyone to contribute actively, and hope that this will become a platform for everyone to gain Pulsar experience and knowledge sharing, and help more community partners understand Pulsar in depth. Scan the QR code to add Bot friends to contact and submit 👇

Click link access the original text.


ApachePulsar
192 声望939 粉丝

Apache软件基金会顶级项目,下一代云原生分布式消息系统