Introduction to Nacos 2.0 improves the performance by about 10 times by upgrading the communication protocol, framework and data model, and solves the performance problems gradually exposed after the release of Nacos 1.0. In this article, Nacos 1.0 was tested under pressure. During the process of upgrading Nacos 2.0 from Nacos 1.0, Nacos 2.0 performs a comprehensive performance comparison to visually demonstrate the performance improvement brought by Nacos 2.0.
Author | Xi Weng
Nacos 2.0 improves the performance by about 10 times by upgrading the communication protocol, framework, and data model, and solves the performance problems gradually exposed after the release of Nacos 1.0. In this article, Nacos 1.0 was tested under pressure. During the process of upgrading Nacos 2.0 from Nacos 1.0, Nacos 2.0 performs a comprehensive performance comparison to visually demonstrate the performance improvement brought by Nacos 2.0.
Pressure test preparation
Environmental preparation
In order to facilitate the deployment and upgrade of Nacos and display core performance indicators, we purchased a 2-core CPU from engine MSE (\_ 160efc7dfc49c6 https://cn.aliyun.com/product/aliware/mse\_ +4G memory three-node Nacos cluster.
Pressure test model
In order to show the performance of the system under different scales, we adopted a gradual pressurization method for pressure measurement, divided the pressure into 3 batches for gradual activation, and observed the operating performance of the cluster under each batch. At the same time, a Dubbo service demo will be added outside the stress cluster, and Jmeter will be used to continuously call at a pressure of 100 TPS to simulate the possible impact on actual business calls under different pressures.
During the stress testing process, the server and client will be upgraded at an appropriate time; the server upgrade will directly use the one-click upgrade function provided by MSE, and the client upgrade will be carried out by restarting in batches.
Pressure test process
Nacos1.X Server + Nacos1.X Client
First start the first batch of pressured clusters to pressure MSE Nacos 1.2.1. Under the pressure of 6000 Providers, when the cluster is stable, the CPU is about 25%, and 6000 instances can be stably maintained.
Subsequently, the second batch of pressure clusters was started, 4000 Providers were added, and 10,000 Providers were collected. At this time, the peak CPU of the cluster has reached 60%, and it is about 45% during stable operation, and the cluster can operate stably.
Under the pressure of the first two batches, the cluster did not have stability problems, so Dubbo's calls remained normal and no errors occurred.
When the third batch of pressured clusters was launched, the pressure totaled 14,000 Providers. At this time, the cluster was first registered to 13000 instances for a short time, and then the number of instances dropped and the CPU ran full. And by reducing the time range, it can be seen that the instances after the drop are still shaking in a small range.
At the same time, there was an error in the call of Dubbo. It can be seen from the Consumer log that the Dubbo Provider was removed because the server could not support this level of pressure, so the No provider error occurred during the call.
Nacos2.X Server + Nacos1.X Client
Since the double-write operation of the instance will be performed during the server upgrade, the number of instances stored on the server will be twice the actual instance value during the upgrade. According to the above test results, you need to roll back the number of instances back to the first batch of 6000 instances, or upgrade the configuration and expand the machine before attempting the upgrade. This article uses the rollback pressure method to stop and then start the pressure cluster. Let the cluster return to normal before performing the upgrade.
It can be seen from the monitoring graph that after the two batches of pressure were stopped, the cluster quickly returned to normal, the operation was stable, and the Dubbo call also returned to normal. Then use the upgrade function of MSE to upgrade. During the upgrade process, due to the performance loss of dual writing, the CPU has a large jitter; and the number of instances due to dual writing has doubled, which is actually equivalent to the extreme pressure of 12000 instances. The server still has a certain amount of jitter, which causes Some Dubbo errors. If you upgrade under non-limiting pressure, there will be no such impact.
After the server upgrade is completed, the dual writing is stopped, eliminating the performance loss caused by the dual writing, the CPU usage decreases and stabilizes, and the number of instances no longer jitters. Dubbo calls are completely restored; just like the 1.X server, the points Two batches started the pressure cluster to compare the performance of the two versions under the same pressure.
Since the client is still using 1.X, the usage level of the server is still very high. After all the pressure is started, the CPU almost reaches 100%; although there is no large-scale instance drop like the 1.X server, However, after running for a period of time, there are still a small amount of instance jitter, indicating that only upgrading the Nacos server to version 2.0 can make a certain improvement, but it does not completely solve the performance problem.
Nacos2.X Server + Nacos2.X Client
In order to fully release the performance of Nacos 2.0, it is also necessary to upgrade the client side of the pressured cluster to version 2.0 or higher. It will also be replaced in 3 batches. During this period, due to the restart of the Provider, it is normal for the server to experience a drop in instances and then recover. With the upgrade of the pressured cluster, it can be found that the CPU has dropped significantly. When it finally reaches stability, the CPU has been reduced from the initial close to 100% to 20%, and the cluster is running 14,000 instances stably.
Pressure test results
As mentioned above, we can get the performance difference of the three-node cluster with 2 core CPU+4G memory under different versions:
Server version | Client version | Stress scale | Cluster stability | CPU usage |
---|---|---|---|---|
Nacos1.X | Nacos1.X | 14000 | Completely unstable | 100% |
Nacos2.X (upgrading) | Nacos1.X | 6000 | There is a certain jitter | 100% |
Nacos2.X | Nacos1.X | 14000 | There is a certain jitter | 100% |
Nacos2.X | Nacos2.X | 14000 | stable | 20% |
It can be seen that Nacos 2.0 does greatly improve performance. New users are recommended to use Nacos 2.0 directly. Old users are recommended to upgrade the Server first, and then gradually upgrade the client to release the bonus. Finally, from the monitoring of the entire pressure test perspective, to intuitively feel the performance of different versions at different stages:
More information
Click https://www.aliyun.com/product/aliware/mse to learn more about MSE Nacos 2.0.
Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。