Introduction to of the cloud-native messaging event stream hyper-converged platform RocketMQ 5.0. The content is mainly divided into three parts: First, I will take you to review the development history of RocketMQ 4, the first choice for business messaging, and the evolution and evolution of the 4.x version. develop. Secondly, I will give you a detailed introduction to the development of RocketMQ 5.0 and some new features. Finally, I will introduce the development roadmap of RocketMQ 5.0, so that community partners can participate in the contribution to 5.0.
Preface: This article is compiled from RocketMQ x EventMesh OpenDay Financial Communication speech content
Author | Financial Link
The topic shared today is the preliminary exploration of the cloud-native message event stream hyper-converged platform RocketMQ 5.0. The content is mainly divided into three parts:
First of all, I will take you to review the development history of RocketMQ 4, the first choice for business messaging, and the evolution and development of version 4.x.
Secondly, I will give you a detailed introduction to the development of RocketMQ 5.0 and some new features.
Finally, I will introduce the development roadmap of RocketMQ 5.0, so that community partners can participate in the contribution to 5.0.
RocketMQ development history
RocketMQ has experienced four generations of architecture since its inception, and has continued to develop with the enterprise IT architecture, from the SOA era to the microservice era, and then to the current cloud-native era. RocketMQ was first born in Alibaba. Alibaba had some self-developed messaging engines in the early days, such as Notify of Taobao and Napoli of B2B business. But both Napoli and Notify are based on relational databases for storage and bring some hidden dangers.
So in 2011, Alibaba developed MetaQ with file system as storage. After continuous exploration, after rewriting MetaQ 2.0, the first generation of RocketMQ was born and named RocketMQ 3.0. In 2013, Alibaba opened up RocketMQ and donated it to the Apache community in 2016. In 2017, RocketMQ graduated from Apache and officially became the top open source project of the Apache Foundation.
With RocketMQ entering the Apache Foundation, RocketMQ 4.x has undergone rapid development, and many versions have been released. It has made a huge leap in architecture multiple copy capabilities, message types, and message governance. At the same time, the community ecology is also growing strong, and the number of contributors worldwide is close to 500.
With the advent of the cloud-native era and the rise of real-time computing, RocketMQ will also undergo a comprehensive upgrade and release RocketMQ 5.0. We and our community partners define RocketMQ 5.0 as a cloud-native hyper-converged platform for messages, events, and streams.
RocketMQ 4 review
Looking back on RocketMQ 4, we have been emphasizing that RocketMQ is the first choice for business news. Many companies use RocketMQ for core transaction links, and even many companies will build two messaging systems, one for Kafka for data analysis, and the other for RocketMQ for business message processing.
Then why RocketMQ will become the unanimous choice of many companies? From the following points, you may be able to get a glimpse:
First, RocketMQ is a financial-grade and highly reliable product. Compared with other messaging middleware, RocketMQ has been verified on a very large scale. Almost all of Alibaba's message links are built on RocketMQ, including core transaction links. For example, on the day of Double Eleven, RocketMQ supported the circulation of more than trillions of messages. At the same time, the messaging service on Alibaba Cloud also served tens of thousands of companies. These large-scale enterprises also have extremely high requirements for SLA. These self-practice and a large number of real-world polishing of customer service play a vital role in the stability of the messaging system.
Second, RocketMQ has a minimalist architecture and is easy to maintain. The entire cluster is composed of NameServer and Broker. NameServer performs route discovery, and Broker is the cluster that actually stores data. As can be seen from the architecture diagram, RocketMQ adopts a two-master and two-standby cluster mode, and the slave node synchronizes data to the master node through synchronous replication or asynchronous replication. This deployment mode ensures that the service can have high availability.
By deploying multiple groups of Brokers, even if the Master of one of the Brokers becomes unavailable, messages can be sent to the Masters of other groups, and Consumers can also read from the Slave. The NameServer is completely stateless. Even if the NameServer is completely down, the client has saved the routing information, so it will not affect the inventory service. In addition, the operation and maintenance of RocketMQ is also very easy. You only need to increase the number of Broker groups when expanding. If there is a problem with a group of Brokers, it can also be disabled, and the routing will be removed immediately without affecting other services.
RocketMQ deployment is also very simple. JAR deployment only needs two lines of commands to pull RocketMQ up. Deploying on K8s is even simpler. If RocketMQ Operator is used, the entire cluster can be pulled up with a single sentence of kubectl apply.
Third, there are rich message types. RocketMQ supports normal messages, sequential messages, delayed messages, retry messages, dead letter messages, transaction messages, etc. In terms of message management, RocketMQ, in addition to the common subscription mode, broadcast mode, and cluster mode, also supports message query, message playback, message trajectory, ACL authority control, etc. In addition, RocketMQ is also a rare messaging product in the industry that natively supports server-side filtering, which can provide users with richer usage scenarios and can also make full use of server-side computing resources. RocketMQ not only supports tag filtering of messages, but also innovatively supports SQL92 filtering. Tag filtering has actually met most of the filtering requirements. If you are particularly complex scenarios, you can consider SQL92 filtering methods. The two filtering methods can basically satisfy all messages. Filter requirements. Compared with other message middleware, RocketMQ's message types and message governance are the most abundant.
Finally, RocketMQ has the characteristics of high throughput and low latency. In Alibaba Double Eleven, RocketMQ supports trillion-level peaks and maintains a millisecond-level response.
Next, let us review the development of RocketMQ 4.x version. In the early days of open source, RocketMQ already supports different types of messages such as ordinary messages, sequential messages, and delayed messages, which basically meets most business scenarios.
After RocketMQ 4.3.0 version, the transaction message was officially released to solve the problem of upstream and downstream data inconsistencies through a two-stage method.
In RocketMQ 4.4.0 version, RocketMQ adds the function of message tracing, which allows users to better locate the delivery and reception path of each message, help troubleshooting, and also increase ACL permission control, which improves RocketMQ's management and control capabilities and security .
In version 4.5.0, RocketMQ introduced multiple copies, which is Raft mode. In Raft mode, if the master of a group of brokers hangs up, the other slaves in the broker will re-elect the master. Therefore, the Broker group has the ability of automatic failover, and also solves the problem of high-availability sequential messages, and further improves the availability of RocketMQ.
In version 4.6.0, we launched a lightweight Pull Consumer, users can use APIs more suitable for stream computing, this version also began to support the new Request-Reply message, so that RocketMQ has the ability to call RPC synchronously, RocketMQ can better break the network isolation calls between networks. In this version, RocketMQ also starts to support IPV6 and is the first messaging middleware to support IPV6.
In version 4.7.0, RocketMQ reconstructed the master-slave synchronous replication process. Through thread asynchronousization, the process of synchronous replication and disk flushing was pipelined, and the performance of synchronous double-write was nearly improved several times.
In version 4.8.0, the RocketMQ Raft model has a qualitative improvement, including the performance improvement through asynchronization, batch replication, etc., by several times. In terms of stability, OpenChaos is used to complete downtime, killing processes, OOM, Various network partitions and delay tests have fixed important bugs. In terms of function, it supports Preferred Leader, so that the Broker group can select the leader first, and also supports functions such as batch messaging.
In version 4.9.0, it is mainly to improve the observability, including support for OpenTracing, transaction messages and Pull Consumer support for Trace and other functions.
It can be seen that RocketMQ has greatly improved in terms of performance, stability, reliability, and observability after years of development. And in this process, companies other than Ali have made outstanding contributions to code construction, which also proves that RocketMQ has become a diversified and prosperous community.
In addition to the development of the RocketMQ main warehouse, the development of the RocketMQ ecological project is also very encouraging, especially the integration with cloud-native hotspot technologies. For example, in cloud-native deployment, we have RocketMQ Operator and RocketMQ Docker projects. On the microservice development framework, the RocketMQ community also builds the RocketMQ Spring Boot Starter access method to facilitate the rapid integration and connection of the communication capabilities between the microservice system of open source users and the RocketMQ message queue, based on Spring Cloud Stream The RocketMQ implementations of Binder and Spring CloudBus are also officially included by Spring Cloud.
In terms of Service Mesh, RocketMQ is the earliest messaging product combined with Envoy, and now it has also completed the integration of Dapr. In terms of Serverless, it includes adapting Cloud Events and the community open source the RocetMQ Knative Source repository.
In terms of observability, RocketMQ supports OpenTracing, OpenTelemetry, Prometheus Exporter, etc.
In the field of Eventing, we have our own RocketMQ Connector, which can complete various external components, such as data interaction and data synchronization between MySQL, ElasticSearch and RocketMQ, and also complete a data flow between MQ clusters. In the field of Streaming, RocketMQ 5.0 will release the native lightweight real-time computing framework RocketMQ-Streams. On the other hand, RocketMQ is also actively integrating with existing big data frameworks such as Flink, Storm, and Spark.
We can see that RocketMQ is not only a pipeline of business messages, but is also responsible for event flow, some offline calculations of business data, and lightweight real-time calculations. Through the development of messages, events, and streams, RocketMQ has formed a complete self-closed-loop ecological development, and is gradually becoming a hyper-converged processing platform for messages, events, and streams.
RocketMQ5.0 overview
Before officially introducing RocketMQ 5.0, we need to answer the following question: Why do we need RocketMQ 5.0? After communicating with many contributors and practicing RocketMQ extensive operation and maintenance, we found that there are two main reasons:
First of all, the demand of the open source community has become more and more prominent. When a large number of enterprises adopt RocketMQ, each user has a wealth of business scenarios. RocketMQ 4.x mainly serves the field of business messaging, so how to use RocketMQ to perform real-time data calculation to process these high-value data has become an important direction for enterprises to explore in the next step. This is why RocketMQ actively expands the field of stream computing from the field of messaging.
Secondly, the quality of cloud messaging service requirements continue to improve. As an in-depth participant and contributor of RocketMQ, Alibaba Cloud's messaging service currently serves tens of thousands of companies. With the service requirements of customer companies and the development of Alibaba's own business, this has placed higher requirements on RocketMQ. How to achieve a set of architecture that can meet the needs of different users in different scenarios at the same time has become a key problem to be solved in RocketMQ 5.0.
Therefore, combined with a wide range of actual business scenarios, RocketMQ 5.0 is a brand-new architecture born in and longer than the cloud. After continuous exploration and practice, RocketMQ 5.0 mainly has the following features:
1. High SLA, low cost: availability, high performance, and low cost consistent with the cloud
2. Scheduling: the remodeling and formation of any component adapts to diverse scenarios
3. Scalable: an open and rich ecosystem
4. Scalability: Extremely automated expansion/reduction
5. Standardization: community standards, in line with industry standards
RocketMQ 5.0 is a hyper-converged platform for cloud-native message event flow. Based on the architecture diagram, we will explain one by one:
1. Lightweight SDK
RocketMQ 5.0 provides a lightweight client, so that it has a good ability to integrate and be integrated. At the same time, the complex logic of load balancing and logical location management is put on the server to realize statelessness. In terms of protocol selection, in addition to the original protocol, the cloud native communication standard gRPC protocol is fully supported.
2. Minimalist architecture
RocketMQ 5.0 still does not introduce any external dependencies and maintains a very low operation and maintenance burden. At the same time, with loose coupling between nodes, any service node can be migrated at any time. RocketMQ 5.0 will be a failure-oriented design, and the failure and migration of any service node can be tolerated.
storage and computing separation architecture
Broker nodes will become truly stateless service nodes without Topic Banding. That is to say, the sending and consumption of messages can occur on any computing node, one access point can proxy all traffic, and the computing layer and storage layer can be independently elastically expanded and contracted. After storage and computing are separated, computing nodes can handle different types of protocols, including Remoting, gRPC, MQTT, AMQP, etc. In addition, the control of ACL, subscription relationship, and multi-tenancy will be placed on the computing node. The most important point is that it is separable and combinable. It can support small clusters or very large-scale clusters, and can adapt to multiple business scenarios, reducing the burden of operation and maintenance.
4. Multi-mode storage
The RocketMQ Raft model takes the form of three copies. After combining with the cloud disk that has three copies, it is equivalent to getting 9 copies. 9 Although the copy brings higher reliability, it also causes serious cost waste. Therefore, RocketMQ 5.0 solves this problem through multi-mode storage. For example, on a common block storage device, two-copy or three-copy deployment can be completed according to availability requirements. Use a single copy on the cloud to better support cloud disk output and make full use of the cloud infrastructure to reduce operation and maintenance costs.
5. Widespread use and integration of cloud native infrastructure
Support projects such as OpenTelemetry and Prometheus, and strengthen the observability of RocketMQ. And better to support the K8s ecology, for example, RocketMQ Operator can pull up the RocketMQ cluster with one command, and complete the full life cycle management of publishing data to Grayscale, and support for automatic elastic expansion and contraction.
Core feature 1: Dividable and combinable storage and computing separation architecture
Next, we will introduce in detail the separable and combinable storage and computing separation architecture. Dividable and combined means that RocketMQ can use the same process to start Broker as it is now, or it can be deployed separately. After separate deployment, the computing nodes can truly be stateless. RocketMQ is very cautious in introducing a separate storage and computing architecture. Integrated deployment brings many benefits. For example, in big data scenarios, integrated deployment provides nearby computing capabilities. , Reduce bandwidth cost. In business messaging scenarios, integrated deployment can reduce latency. At the same time, the separation of storage and computing also has many benefits. For example, expansion and contraction can be more flexible, and capacity can be expanded and contracted for specific computing resources or storage resources.
Therefore, RocketMQ 5.0 will provide a separable and combinable storage and computing separation architecture, which can adapt to a variety of scenarios. Compute nodes are completely stateless. Including management and control such as protocol adaptation, traffic tenants, etc., will be completed on the computing node. In addition, through the POP consumption mode, the management of the load balancing logic of the entire client is upgraded to the computing node. Without Queue Binding, any computing node can send and receive. In addition, due to statelessness, elastic expansion and contraction in seconds can be completed, and there is no burden of Rebalance in the process.
At the same time, RocketMQ 5.0 will optimize and adjust the storage cluster. In the storage cluster, we natively retain storage support for multiple message types, including transaction messages, timing messages, retry messages, and dead letter messages. In the choice of copy, different support is provided according to different scenarios, including local block device multiple copy support and cloud disk single copy support. With the help of multi-modal storage capabilities, make full use of cloud infrastructure to reduce costs.
Another very important point is the support of multiple indexes. Now RocketMQ stores a CommitLog, and the background thread distributes and constructs ConsumeQueue and index indexes. Then RocketMQ 5.0 will comprehensively enhance the index and support more kinds of indexes. For example, by adding a batch index, messages can be sent in batches, stored in batches, and received in batches, thereby enhancing RocketMQ Batch capabilities. For example, the message queue only does message rotation, and the query ability is relatively weak. In RocketMQ 5.0, the message and KV will be better combined to build the query index to enhance the KV capability. With one piece of data and multiple indexes, RocketMQ 5.0 can meet different scenarios.
Core feature two: data access method with integrated flow and batch
First introduce the brand-new consumption mode-POP consumption mode. The figure in the upper left corner is the load balancing architecture of RocketMQ 4.0's existing consumer end. For example, Topics are now distributed on 3 Brokers, a total of 9 queues. In the cluster mode, a consumer group has 3 consumers. according to. So each consumer is assigned to three queues.
But this also brings some problems. For example, a consumer suddenly hangs live. It can’t consume messages but it’s not disconnected. It still keeps the heartbeat connection with the Broker. will not be removed and rebalanced, so The queue assigned to this consumer will have a large amount of messages piled up, and the consumption of these queues will be stuck.
This is essentially a binding relationship problem. Once Rebalance occurs, the Consumer and the queue are bound. In response to this problem, RocketMQ 5.0 introduced a new way of consumption, namely POP consumption. It cancels the binding relationship caused by Rebalance. A queue can be consumed by any number of Consumers, and then concurrency control is completed through the queue lock on the Broker side.
In POP consumption, the client will directly go to the queue of each Broker to request consumption, and the Broker will distribute the message back to the waiting client. Then the client will return the corresponding ACK result to notify the Broker after the end of the consumption, and the Broker will mark the message consumption result. If there is no response after the timeout or the consumption fails, it will try again.
For each POP request, the Broker will have the following three operations:
1. The corresponding queue is locked, and then the message of the queue is obtained from the Store layer;
2. Then write the CK message, indicating that the obtained message will be consumed by the POP;
3. Finally, submit the current location and release the lock.
The CK message is actually a timing message that records the specific location of the POP message. When the client does not respond to the timeout, the CK message will be consumed by the broker again, and then the message at the location of the CK message will be written to the retry queue. If the Broker receives the ACK of the client's consumption result, it deletes the corresponding CK message, and then judges whether it needs to retry according to the specific result.
It can be seen from the overall process that POP consumption does not require Reblance, which can avoid the consumption delay caused by Rebalance. At the same time, the client can consume all the Broker queues, so as to avoid the problem of accumulation caused by the machine Hang.
With the POP, PUSH, PULL and other modes, RocketMQ can complete the data access method of stream batch integration. For example, in the Streaming scenario, the original PUSH method can ensure good sequential consumption. However, in scenarios where the sequence requirements are not high, such as batch processing, we can use POP consumption to perform high concurrent reading on the same queue to speed up data reading. On the other hand, the POP consumption model also makes the client more lightweight. A lot of logic is on the server, and it is more friendly to the writing of multi-language clients.
Core feature 3: Extremely flexible expansion and contraction
The figure above is the existing architecture of RocketMQ. For example, we can make Broker 1001 traffic flow naturally to 1002 by prohibiting write operations. However, in the field of Streaming, upper-level services generally require storage queues to always be fixed. Only in this way can the sequence and integrity of streaming data processing be guaranteed. This also requires expansion and contraction without causing changes in the number of queues. Therefore, the RocketMQ 5.0 Preview version provides a logical queue concept, combining the original physical queue logic together, and a logical queue can be distributed to different Brokers. For example, a logical queue in the figure, 0-100 on Broker 1001, 100-1000 on Broker 1002, and 1000-2000 on Broker 1003. Multiple physical queues can be connected in series to form a large logical queue through combination.
Since the logical queue is a Binding process, it is a very lightweight operation. Therefore, it provides a second-level elastic expansion capability. There is no data copy in the process. As long as the Broker expansion is completed, the binding operation is completed. The traffic has also been allocated. In addition, we also provide compatibility with dual-mode queues. Normally, the original physical queue is the default. The logical queue will only be used when a single topic is specified to be turned on.
Core feature four: lightweight real-time computing
There is also a very heavyweight feature in RocketMQ 5.0, which will launch a lightweight real-time computing framework RocketMQ Streams. Its design goal is to help users not rely on external heavyweight computing products, and only use existing RocketMQ resources to complete the lightweight data processing and calculations required by most business scenarios.
RocketMQ Stream has few dependencies and is simple to deploy. It can scale arbitrarily by using RocketMQ Rabalance capabilities. It also supports common operators such as Map and Fliter, as well as Window, Join, and dimension tables. In addition, compared to other message-based real-time computing platforms, RocketMQ Streams not only provides native dependency-free support, it is compatible with Flink SQL standards and provides UDF/UDAF/UDTF capabilities.
On the other hand, in the real-time computing ecosystem, RocketMQ is also actively connecting with other big data frameworks, including Flink, Spark, etc. In particular, RocketMQ-Flink Connector based on the latest standards will also graduate soon.
RcoketMQ 5.0 Landscape
RocketMQ version 5.0 will be officially released this year. The 5.0 Preview version is already in Discuss, and the code is also placed in the Github warehouse. The 5.0 Preview version will launch logical queues and stream batch access methods. Later, we will officially release the real-time stream computing framework RocketMQ Streams, and support batch processing and batch indexing capabilities in RocketMQ 5.0. In the subsequent milestones, RocketMQ 5.0 will complete gRPC protocol support, a new lightweight client, complete AMQP, MQTT protocol support, and a separable and combinable storage and computing separation architecture.
I also hope that more small partners can participate in the Apache RocketMQ community to build the next generation of cloud native messaging engine and create a hyper-converged processing platform for Messaging, Eventing, and Streaming.
Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。