foreword

At the RocketMQ Summit global developer summit that ended last month, the Apache RocketMQ community released a panorama of the capabilities of the new generation of RocketMQ, explaining the technical positioning and development direction of RocketMQ 5.0 for many developers.

 title=

In the past seven years of large-scale cloud computing practice, RocketMQ has continuously evolved itself. Today, RocketMQ officially enters the 5.0 era.

From the community's interpretation of version 5.0, we can see that under the tide of cloud native and enterprise cloud adoption, in order to better match the demands of business developers, Apache RocketMQ has done a lot of architectural upgrades and adaptation of productization capabilities. So how to implement RocketMQ 5.0 in the production practice of the enterprise? The core of this article is the cloud-native message architecture and product capabilities. It introduces how Alibaba Cloud makes its own judgment and evolution based on the new RocketMQ 5.0 kernel, and how to adapt to more and more enterprise customers in technology and capacity requirements.

Evolution of cloud native messaging services

First, let's take a look at the evolution of cloud-native messaging services.

Facing the future, the message product capability adapting to the cloud-native architecture should make important breakthroughs in the following aspects:

 title=

  • Large-scale elasticity : The essence of migrating to the cloud is to relieve the burden and pressure of resource supply and focus on business integration and development. As the operation and maintenance party of the message service, it should provide the upper-layer business with the resource supply capability that matches the model, and provide the most suitable elastic capability with the development of business traffic. On the one hand, it can solve the system risk facing uncertain burst traffic, and on the other hand, it can also improve resource utilization.
  • Ease of use : Ease of use is an important capability of integrating middleware. Message services should comprehensively reduce the burden on users and avoid mistakes, from API design to integrated development, to configuration and operation and maintenance. Only a low threshold can open the market and expand the mind and group.
  • Observability : Observability is very important for all participants of message services. Service providers should provide observation and diagnosis capabilities with clear boundaries and open standards, so as to relieve the burden of message operation and maintenance parties and realize self-checking by users. and clarity of boundary responsibilities.
  • High stability and high SLA : Stability is an essential core capability of the production system. Messages are often integrated in the core transaction link. The message system should clarify the availability and reliability indicators of services. The user should design its own fault recovery and redundant safety mechanism based on a clear SLA.

Based on these four key evolution directions, the following is an overall introduction to how Alibaba Cloud RocketMQ 5.0 is implemented in these aspects.

Elasticity at scale: Provides optimal resource provisioning capabilities that match business models

Message services are generally integrated in the core links of the business, such as transaction, payment and other scenarios. These scenarios often have fluctuating business traffic, such as big promotions, spikes, and morning peaks.

In the face of fluctuating business scenarios, the messaging service of Alibaba Cloud RocketMQ 5.0 can adapt to the demands of the business to achieve resource expansion. On the one hand, within a relatively stable business processing baseline, fixed resources are reserved at the lowest cost; on the other hand, in the event of occasional burst traffic glitches, it supports adaptive elasticity, uses according to the amount, and pays on demand. The combination of the two modes can achieve stable and safe high-water operation without having to reserve a lot of resources for uncertain traffic peaks all the time.

 title=

In addition to the elastic adaptation of message processing traffic, message systems are also stateful systems that store a large amount of high-value business data. When the system call pressure changes, the storage itself also needs to have elasticity. On the one hand, it needs to ensure that data is not lost, and on the other hand, it needs to save the cost of storage and avoid waste. The traditional local disk-based architecture naturally has the problem of capacity expansion. First, the capacity of the local disk is limited. When the capacity needs to be expanded, only nodes can be added, which leads to a waste of computing resources; Only by isolating and going offline of business-side traffic can storage costs be reduced, and the operation is very complicated.

 title=

The message storage of Alibaba Cloud RocketMQ 5.0 has a natural serverless capability. The storage space can be used on-demand and pay-as-you-go. Business personnel only need to set a reasonable TTL time as required to ensure data integrity during long-term storage.

Ease of Integration: Simplify business development, reduce mental burden and cost of understanding

Integration ease of use is a system design constraint, which requires message services from API design to integration development to configuration operation and maintenance to comprehensively reduce the burden on users and avoid mistakes. For a typical scenario, in message queues such as RocketMQ 4.x or Kafka, the business is often troubled by load balancing strategies when consuming messages. The business side needs to pay attention to the number of queues (partitions) of the current message topic and the number of current consumers. . Because consumers perform load balancing and task allocation according to the queue granularity, as long as the consumer capabilities are not equal, or the number cannot be distributed evenly, it will inevitably cause some consumers to accumulate and cannot be recovered.

In a typical business integration scenario, the client actually only needs to consume in a stateless message model, and the business only needs to care whether the message itself is processed, not the internal storage model and strategy.

Based on this idea, Alibaba Cloud RocketMQ 5.0 provides a new SimpleConsumer model, which supports atomic capabilities such as consumption, retry, and submission of any single message granularity.

 title=

Observability: Provide self-diagnosis capabilities with clear boundaries and open standards

 title=

Students with experience in operating and maintaining message queues will find that the message system couples the upstream production and downstream consumption processing of the business. Often, when a problem occurs on the business side, it is impossible to clearly define whether it is an abnormal message service or an abnormal business processing logic.

The observability of Alibaba Cloud RocketMQ 5.0 is to provide a solution to this fuzzy and uncertain boundary. Based on the three aspects of events, trajectories, and indicators, it covers all the details of the link from the latitude of points, lines, and surfaces. The definitions of events, trajectories, and metrics cover the following:

  • Events: covering server-side operation and maintenance events, such as downtime, restart, and changing configuration; client-side change events, such as triggering subscription, canceling subscription, going online, going offline, etc.;
  • Trajectory: Cover the life cycle of a message or call chain, show the entire process of a message from production to storage, and finally to the completion of consumption, capture all participants of the entire chain according to the timeline, and lock the scope of the problem;
  • Indicators: Indicators are a wider range of observations and early warnings, quantifying various capabilities of the message system, such as sending and receiving TPS, throughput, traffic, storage space, failure rate, and success rate.

 title=

Alibaba Cloud RocketMQ has also accumulated a lot of observability. It not only takes the lead in supporting complete message track link query, but also supports the reporting of client and server Trace and Metrics information using the standard OpenTelemetry protocol in the new version 5.0. Stored in third-party Trace and Metrics, standardized display and analysis can be achieved with the help of open source products such as Prometheus and Grafana.

Stability SLA: Provide assessable, quantifiable, and well-defined service assurance capabilities

Stability is an essential core capability of a production system. The message system is often integrated in the core transaction link. Whether the message system is stable or not directly affects whether the business is complete and available. However, the guarantee of stability itself is not only about operation and maintenance management, but should be sorted out from the design stage of the system architecture to quantify service boundaries and service indicators. Only when the availability and reliability indicators of services are clarified, can users design their own faults Bottom Line and Redundant Safety Mechanisms.

 title=

The traditional passive protection method based on operation and maintenance methods can only do basic expansion and contraction and system indicator monitoring. It cannot provide quantitative services for various complex boundary scenarios of messages, such as message accumulation, cold reading, and broadcasting. ability. Once the upper-layer business party triggers these scenarios, the system will be broken through, thereby losing service capabilities.

The systematic stability construction of Alibaba Cloud RocketMQ 5.0 is to provide the ability to quantify services for scenarios such as message accumulation and cold reading from the system design stage, and to determine reasonable message sending RT, end-to-end delay, and TPS capabilities for sending and receiving. The system triggers these conditions and can limit and protect within the tolerance range.

This article introduces the evolution and direction of RocketMQ 5.0 from the aspects of large-scale elasticity, ease of integration, observability, and stability SLA, and also introduces the practice and implementation of Alibaba Cloud Message Queue RocketMQ 5.0 in these areas.


Alibaba Cloud Message Queue RocketMQ 5.0 has now been officially commercialized, and has been comprehensively enhanced in terms of functions, flexibility, ease of use, and O&M convenience. At the same time, the pricing has been reduced by up to 50% compared to the previous generation instance, helping enterprises reduce costs and increase efficiency. , to achieve business development and integration with a lower threshold. The new generation instance supports free scaling from 0 to 1 million TPS, supports burst traffic elasticity and storage serverless; in terms of observability, it supports full-link trajectory integration and custom Metrics integration; in terms of ease of integration, it supports a new generation of instances Lightweight native multi-language SDK, more stable and easy to use.

 title=

Click here to enter the RocketMQ 5.0 commercial version conference live room~


阿里云云原生
1k 声望302 粉丝