Heavy release: Microservice Engine MSE Professional Edition

Introduction to performance increase of 10 times, a higher SLA guarantee, and new users can buy a 20% off resource pack for a limited time.

Engine MSE Professional Edition is released, supporting Nacos 2.0. Compared with the basic version, the Professional Edition has a higher SLA guarantee, a performance increase of ten times, 99.95% availability, and configuration capabilities are further enhanced. New users first purchase 20% off, click " view details " for more information.

Since the release of Nacos 1.0, Nacos has been quickly adopted by thousands of companies and has built a strong ecosystem. But as users use it in depth, some performance problems are gradually exposed. Therefore, we launched the intergenerational product design of Nacos 2.0. After half a year, we finally realized all of them. The measured performance was increased by 10 times. I believe it can meet the performance needs of all users. Let me introduce this cross-generation product to you on behalf of the community.

Introduction to Nacos

Nacos is a dynamic service discovery, configuration management and service management platform that makes it easier to build cloud-native applications. It was incubated in Alibaba, and grew up in the 10-year double eleven peak test, which has precipitated its core competitiveness that is easy to use, stable and reliable, and excellent in performance.

Nacos 2.0 architecture

The new 2.0 architecture not only greatly improves performance by 10 times, but also implements a layered abstraction of the kernel and implements a plug-in extension mechanism.

The architecture level of Nacos 2.0 is shown in the figure below. Compared with Nacos 1.X, the main changes are:

  • The communication layer is unified to the gRPC protocol, and at the same time, the flow control and load balancing capabilities of the client and server are improved, and the overall throughput is improved.
  • The storage and consistency models are fully abstracted and layered, the architecture is simpler and clearer, the code is more robust, and the performance is stronger.
  • Designed an extensible interface to improve integration capabilities, such as allowing users to extend their own security mechanisms.

Nacos2.0 Service Discovery and Upgrade Consistency Model

For service discovery under the Nacos2 architecture, the client initiates a service registration or subscription service request through Grpc. The server uses the Client object to record which services the client has published using the Grpc connection, and which services it has subscribed to, and synchronize the Client between services. Since the actual usage habit is the mapping from service to client, that is, which client instances are under the service; therefore, the 2.0 server will quickly generate service information similar to 1.X by constructing indexes and metadata, and integrate the service data Push through Grpc Stream.

Nacos2.0 configuration management upgrade communication mechanism

Before configuration management, use the Keep Alive mode of Http1.1 to send a heartbeat simulation long link for 30s. The protocol is difficult to understand, the memory consumption is large, and the push performance is weak. Therefore, 2.0 completely solves these problems through gRPC, and the memory consumption is greatly reduced.

Nacos2.0 architecture advantages

Nacos2.0 greatly reduces resource consumption, improves throughput performance, optimizes the interaction between the client and the server, and is more user-friendly; although the observability is slightly reduced, the overall cost performance is very high.

Nacos2.0 performance improvement

Since Nacos is composed of two major modules: service discovery and configuration management, the business models are slightly different, so let's introduce the specific stress test indicators below.

Performance improvement of Nacos2.0 service discovery

In service discovery scenarios, we mainly focus on the number of clients, the number of instances of services, and the number of service subscribers. In large-scale scenarios, the performance of the server in push and stable state. At the same time, it also pays attention to the performance of the system when a large number of services are going online and offline.

Capacity and steady state test

This scenario mainly focuses on system performance as the service scale and client instance scale increase.

It can be seen that the 2.0.0 version can stably support under the 10W-level client scale. After reaching a stable state, the CPU consumption is very low. Although in the initial large-scale registration stage, due to the instantaneous large-scale registration and push, there is a certain push timeout, but the push will be successful after retrying, and data consistency will not be affected.

In contrast to the 1.X version, under the 10W and 5W level clients, the server is completely in the Full GC state, the push completely fails, and the cluster is unavailable; under the 2W client scale, although the server is running normally, the heartbeat processing cannot be performed. In time, a large number of services are repeated during the removal and registration phases, so the steady state is not reached, and the CPU is always high. It can run stably under the 1.2W client scale, but the CPU consumption in the steady state is more than 3 times that of 2.0 under the larger scale.

Frequent change test

This scenario mainly focuses on the large-scale business release and frequent service push conditions, and the throughput and failure rate of different versions.

In frequent changes, both 2.0 and 1.X can be supported stably after reaching a stable state. Among them, 2.0 has no instant push storm, so the push failure rate is zero, and the instability of 1.X UDP push A very small number of pushes have timed out and need to be retryed.

Performance improvement of Nacos2.0 configuration management

Since the configuration is a scenario of less writing and more reading, the bottleneck is mainly in the number of clients monitored by a single monitor and the push acquisition of the configuration. Therefore, the pressure measurement performance of configuration management mainly focuses on the number of connections of a single server and the comparison of a large number of pushes.

Nacos2.0 connection capacity test

This scenario mainly focuses on the system pressure under different client scales.

Nacos2.0 can support 4.2w configuration client connections at the highest single machine. During the connection establishment stage, there are a large number of subscription requests that need to be processed, so the CPU consumption is high, but after reaching a steady state, the CPU consumption will become very low. Almost no consumption.

In contrast to Nacos 1.X, when the client is 6000, the steady state CPU is always high and the GC is frequent. The main reason is that the long rotation training is to maintain the connection through the hold request. It needs to return a response every 30s and re-initiate the connection and request. Need to do a lot of context switching, but also need to hold all Request and Response. When the scale reaches 1.2w clients, it has been unable to reach a steady state, so it cannot support the number of clients of this magnitude.

Nacos2.0 frequent push test

This scenario focuses on system performance under different push scales.

In a frequently changing scenario, both versions are in 6000 client connections. Obviously, it can be found that the performance loss of version 2.0 is much lower than that of version 1.X. In the 3000tps push scenario, the optimization degree is about 3 times optimized.

Nacos2.0 performance conclusion

service discovery scenarios, Nacos2.0 can run stably at a scale of 10W; compared to the 1.2W scale of the Nacos1.X version, an increase of about 10 times.

configuration management scenarios, Nacos2.0 single machine can support up to 4.2W client connections; compared to Nacos1.X, an increase of 7 times. And the performance during push is significantly better than 1.X.

Nacos ecology and 2.X follow-up planning

With the three-year development of Nacos, it has supported almost all open source RPC frameworks and microservice ecology, and led the development of cloud native microservice ecology.

Nacos is a very core component in the entire microservice ecosystem. It can seamlessly interoperate with the K8s service discovery system. It communicates with Istio through the MCP/XDS protocol to deliver the Nacos service to the Sidecar; it can also be combined with CoreDNS to transfer the Nacos service through the domain name. The pattern is exposed to downstream calls.

Nacos has been integrated with various microservice RPC frameworks to perform service discovery; in addition, it can assist the high-availability framework Sentinel to control and issue various management rules.

If you only use the RPC framework, sometimes it is not simple enough, because some RPC frameworks, such as Grpc and Thrift, also need to start the server by itself and tell the client which IP to call. At this time, it needs to be integrated with application frameworks, such as SCA, Dapr, etc.; of course, Envoy Sidecar can also be used for flow control, and the RPC at the application layer does not need to know the IP list of the service.

Finally, Nacos can also communicate with various microservice gateways to realize the distribution of the access layer and the invocation of microservices.

The practice of Nacos ecology in Alibaba

At present, Nacos has completed the construction of the trinity of self-research, open source, and commercialization. Alibaba’s internal DingTalk, Koala, Ele.me, Youku and other business domains have all adopted the Nacos service in the cloud product MSE, and the Alibaba and cloud native Seamless integration of the technology stack. Let's briefly introduce Dingding as an example.

Nacos runs on the engine MSE (fully managed Nacos cluster) for maintenance and multi-cluster management; various Dubbo3 or HSF services of the business are registered to the Nacos cluster through Dubbo3 itself at startup; then Nacos uses the MCP protocol Synchronize service information to Istio and Ingress-Envoy gateways.

User traffic enters the group's VPC network from the north direction, first through a unified access to the Ingress-Tengine gateway, he can resolve and route the domain name to different computer rooms, units, etc. This week we also updated the Tengine 2.3.3 version, the kernel was upgraded to Nginx Core 1.18.0, it supports the Dubbo protocol, supports DTLSv1 and DTLSv1.2, and supports the Prometheus format, thereby improving the ecological integrity and security of Alibaba Cloud microservices. Nature and observability.

After passing through the unified access layer gateway, the user request will be forwarded to the corresponding microservice through the Ingress-Envoy microservice gateway, and called. If you need to call services from other network domains, the traffic will be imported to the corresponding VPC network through the Ingress-Envoy microservice gateway, so as to open up services in different security domains, network domains, and business domains.

The mutual calls between microservices will be carried out through Envoy Sidecar or traditional microservice self-subscription. In the end, the user request is completed and returned to the user in the mutual invocation of each microservice.

Planning for Nacos 2.X

Nacos 2.X will implement new functions through plug-ins and transform a large number of old functions on the basis of solving performance problems in 2.0, making Nacos more convenient and easier to expand.

to sum up

As a cross-generation version, Nacos2.0 completely solves the performance problems of Nacos1.X and improves the performance by 10 times. And through abstraction and layering to make the architecture simpler, through plug-in better expansion, so that Nacos can support more scenarios and integrate a broader ecology. It is believed that Nacos 2.X will be easier to use after subsequent iterations, solve more micro-service problems, and explore more in-depth towards Mesh.

Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own the copyright, and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.
阅读 639


2.9k 声望
6.1k 粉丝
0 条评论


2.9k 声望
6.1k 粉丝