Author: Zhang Tianyi (Chengtan)
To avoid confusion, let's clarify some key definitions:
- Traditional gateway: No containerization transformation, K8s is not enabled, and it is constructed by two-layer gateways of traffic gateway and business gateway. The traffic gateway provides global policy configuration that has nothing to do with back-end business. For example, Tengine is a typical traffic gateway; business The gateway provides independent business domain-level policy configuration that is tightly coupled with the back-end business. With the evolution of the application architecture model from a single entity to a distributed microservice, the business gateway also has a new name - a microservice gateway.
- K8s gateway: the cloud native gateway, also known as the next-generation gateway, Ingress has become the gateway standard of the K8s ecosystem, promoting the integration of the traffic gateway and the service gateway into one. The implementation based on the Ingress specification is mainly divided into two camps based on Nginx and based on Envoy. The Nginx Ingress Controller based on Nginx is the choice of most K8s clusters at present, and the implementation based on Envoy is a rising star and has the potential to catch up.
- MSE Cloud Native Gateway: It is a deeply optimized cloud service based on Envoy.
This article will compare the two open source implementations from the aspects of performance, cost, reliability, and security, hoping to learn from enterprises that are making K8s gateway selection.
performance and cost
The throughput performance of the MSE cloud native gateway is almost double that of the Nginx Ingress Controller, especially when transmitting small texts.
Gateway Specifications: 16 Cores 32 G*4 Nodes
ECS model: ecs.c7.8xlarge
When the CPU load increases, the throughput gap will be more obvious. The following figure shows the situation when the CPU usage reaches 70%:
The reason for the decrease in Nginx Ingress Controller throughput under high load is the pod restart. For details, see the analysis in the next section "Reliability".
As more and more attention is paid to network security, HTTPS has been widely used for transmission encryption on the Internet. On the gateway side, the TLS asymmetric encryption algorithm used to implement HTTPS occupies the majority of CPU resources. For this scenario, the MSE cloud native gateway uses CPU SIMD technology to implement hardware acceleration of TLS encryption and decryption algorithms:
From the pressure measurement data in the figure above, it can be seen that after using TLS hardware acceleration, the TLS handshake delay is doubled compared to ordinary HTTPS requests, and the limit QPS is increased by more than 80%.
Based on the above data, using the MSE cloud native gateway, the throughput of the Nginx Ingress Controller can be achieved with only half of the resources. In the HTTPS scenario that has been optimized for hardware acceleration, the throughput can be further improved.
reliability
As mentioned above, under high load, the Nginx Ingress Controller will restart the pod, resulting in a decrease in throughput. There are two main reasons for the restart of the pod:
- The liveness health check (livenessProbe) is prone to timeout and failure under high load. The community has made certain optimizations by reducing redundant detection in version 0.34, but the problem still exists.
- When prometheus is enabled to collect monitoring indicators, OOM will occur during high load, resulting in the container being killed. For details, see the related issue: https://github.com/kubernetes/ingress-nginx/pull/8397
These two problems are essentially caused by the unreasonable deployment architecture of Nginx Ingress Controller. The control plane (Controller implemented by Go) and the data plane (Nginx) process run together in a container. Under high load, the data plane process and the control plane process appear CPU preemption. The control plane process is responsible for health check and monitoring indicator collection, because insufficient CPU leads to request backlog, OOM and health check timeout.
This situation is extremely dangerous and will cause an avalanche effect of the gateway under high load, which will seriously affect the business. The MSE cloud native gateway uses an architecture that isolates the data plane and the control plane, and has reliability advantages in the architecture:
As can be seen from the above figure, the MSE cloud native gateway is not deployed in the user's K8s cluster, but in a purely managed mode. This mode has more advantages in reliability:
- Will not run on an ECS node mixed with business containers
- Multiple instances of the gateway will not be mixed on one ECS node
- Provides SLA guarantee of gateway availability
If Nginx Ingress Controller is used to achieve highly reliable deployment, it is generally necessary to monopolize ECS nodes and deploy multiple ECS nodes to avoid single points of failure. In this case, resource costs will skyrocket. In addition, Nginx Ingress Controller cannot provide SLA guarantee of gateway availability because it is deployed in user clusters.
safety
There are still some hidden CVE vulnerabilities in different versions of Nginx Ingress Controller. The specific affected versions are shown in the following table:
After migrating from Nginx Ingress Controller to MSE Cloud Native Gateway, all hidden CVE vulnerabilities will be fixed at one time; in addition, MSE Cloud Native Gateway provides a smooth upgrade solution. Once new security vulnerabilities occur, the gateway version can be upgraded quickly, while ensuring that The upgrade process has minimal business impact.
In addition, the MSE cloud native gateway has built-in Alibaba Cloud Web Application Firewall (WAF), which has shorter user request links and lower RT than traditional WAF, and can achieve fine-grained route-level protection compared to Nginx Ingress Controller. Currently 2/3 of the Alibaba Cloud Web Application Firewall architecture.
MSE Cloud Native Gateway
Alibaba Cloud Container Service App Market has launched the MSE cloud native gateway, which can be used to replace the default gateway component Nginx Ingress Controller.
The MSE cloud native gateway has been used on a large scale as a gateway middleware within the Alibaba Group, and its strong performance and reliable stability have been verified by years of Double Eleven traffic.
In the K8s container service scenario, compared with the Nginx Ingress Controller installed by default, the main advantages are as follows:
- Stronger performance and more reasonable architecture can reduce gateway resource costs by at least 50%
- Better reliability and SLA guarantee, pure hosting without operation and maintenance, backed by Alibaba Cloud technical team to provide support
- Better security protection, one-time solution to existing CVE security vulnerabilities, and built-in WAF protection function
At the same time, it provides richer functions in routing strategy, gray management, observability, etc., and supports the development of custom extension plug-ins in multiple languages. For detailed comparison, please refer to: https://help.aliyun.com/document_detail/ 424833.html
Smooth Migration Solution
The deployment of the MSE cloud native gateway does not directly affect the traffic of the original gateway. Through the DNS weight configuration, the service traffic can be smoothly migrated, and the back-end service is completely unaware. The core traffic migration process is shown in the following figure:
The complete steps are as follows:
- Step 1: Find mse-ingress-controller in the application market of Container Service and install it to the target ACK cluster
- Step 2: Configure MseIngressConfig (configuration guide) in K8s to automatically create an MSE cloud native gateway of the specified specification
- Step 3: Obtain the IP of the MSE cloud native gateway from the address field of the Ingress, bind the host locally, resolve the business domain name to the IP, and complete the business test
- Step 4: Modify the DNS weight configuration of the business domain name, add the cloud native gateway IP, and gradually increase the weight to perform traffic grayscale
- Step 5: After completing the grayscale, remove the original IP of the business domain name from the DNS configuration, so that all traffic can be switched to the cloud native gateway
Click here to learn more about cloud native gateway products~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。