The 618 promotion is in full swing. The article "The 618 Big Promotion is Coming, Talking About How to Prepare for the Big Promotion" introduces the methodology and technical means to ensure the high availability of the big promotion in an all-round way. This article continues to focus on the gateway and deeply discusses how to make the gateway highly available in the big promotion scenario. The protection will be introduced one by one from the following points:

  1. The Importance of Gateway High Availability Protection
  2. The "next-generation gateway architecture" of the MSE cloud native gateway has great advantages in high-availability protection
  3. High Availability Protection Using MSE Cloud Native Gateway (Video Demo)

The Importance of Gateway High Availability Protection

In the big promotion scenario, why is it important to use a gateway for high availability protection? In a word, the gateway has the ability to transform various uncertain factors into deterministic factors, and this ability is irreplaceable. From three perspectives:

The first point is to deal with the uncertainty of flow peaks, and the uncertain flow must be turned into a certainty through the flow limit rule. It is difficult for the business service module to do current limiting by itself. Because there is a prerequisite for implementing current limiting protection, the service carrying this burst of traffic can still maintain a normal CPU load. Even if the business service module implements the QPS current limit of the application layer, in the instantaneous high concurrency scenario, the CPU may still surge due to a large number of new connections at the network layer, and the current limit rules are useless. The business module should focus on the business logic of the application layer. To cope with the network layer overhead that it is not good at through capacity expansion, the required resource cost is quite high. The status of the gateway as a business traffic entrance determines that it must be good at dealing with high concurrent network traffic, and this performance is also an important indicator to measure the gateway's capability. The stronger the performance in dealing with high concurrency, the lower the required resource cost. , the greater the ability to change big promotion traffic from uncertain to certain.

The second point is to deal with the uncertainty of user behavior. It is necessary to simulate user behavior for multiple rounds of stress testing exercises according to different promotion scenarios, so as to discover the bottlenecks and optimization points of the system in advance. The gateway is not only the traffic entry for user access, but also the final exit for backend business responses. This determines that the gateway is a must-stop for simulating user behavior for traffic stress testing, and it also determines that it is a necessary part of observing stress testing indicators to evaluate user experience. On the gateway, stress testing, observation, and adjustment of the current limiting configuration can greatly promote the construction of a high-availability system, which can achieve twice the result with half the effort.

The third point is the uncertainty of dealing with security attacks . During the big promotion period, it is usually the time when black and gray production is active, and abnormal transaction traffic is likely to trigger the current limiting rules, thus affecting the access of normal users. Gateway-based traffic security protection capabilities, such as WAF and other functions, by identifying abnormal traffic in advance and intercepting it, and automatically adding abnormal IP and cookies to the blacklist, this part of traffic can be excluded from the current limiting threshold, or it can be Ensure the security of back-end business logic. This is also an essential part of the promotion of high-availability protection.

Advantages of MSE Cloud Native Gateway

Architectural Advantage

The MSE cloud native gateway implements a three-in-one "next-generation gateway architecture" of traffic gateway, microservice gateway, and security gateway. The comparison with the common multi-layer gateway architecture is as follows:

  • Common multi-layer gateway architecture

 title=

In this architecture, the WAF gateway is used to implement security capabilities, SLB to implement load balancing capabilities, the Ingress gateway to implement cluster ingress gateway capabilities (a layer of Nginx will also be deployed in non-K8s scenarios), and Zuul to implement microservice gateway capabilities. In the face of large-scale burst traffic, under such an architecture, it is necessary to evaluate the capacity of each layer of gateways. Each layer of gateways is a potential bottleneck point and may need to be expanded. The resource cost and operation and maintenance labor cost caused by this are huge. And for every additional layer of gateway, there is an additional layer of availability risk.

  • MSE Cloud Native Gateway Architecture

 title=

Using the MSE cloud native gateway, on the basis of reserving SLB for load balancing, only one layer of gateway is used to realize all the capabilities of the cluster entry gateway, WAF gateway, and microservice gateway. In response to the big promotion scenario, operation and maintenance personnel only need to focus on the MSE gateway layer to manage all ingress traffic and achieve high-availability protection. This is the "Next Generation Gateway Architecture", which makes everything simple, and only simplicity can be relied upon.

Performance advantage

As shown in the figure below, the throughput performance of the MSE cloud native gateway is twice that of the Nginx Ingress Controller. For the specific performance comparison analysis, please refer to "K8s Gateway Selection Preliminary Judgment: Nginx or Envoy? " article. In the face of the flood peak traffic, if the performance of the gateway is not good enough, it means that the enterprise has to pay more ECS resource costs, and at the same time worry about whether the gateway itself can handle the traffic. The loss is immeasurable.

 title=

Gateway Specifications: 16 Cores 32 G*4 Nodes

ECS model: ecs.c7.8xlarge

In terms of high availability, the MSE cloud native gateway has built-in Alibaba Sentinel high availability module. After years of double eleven traffic test, it provides a wealth of current limiting protection capabilities, including flow control rules, concurrency rules, and circuit breaker rules, which can fully guarantee the backend. High service availability; in addition, the MSE cloud native gateway also has the ability to preheat traffic. The small traffic preheating method can effectively solve the problems of slow response to a large number of requests and request blocking caused by slow resource initialization in large-scale promotion scenarios, avoiding the need for new capacity expansion. The nodes cannot provide normal services, affecting the user experience.

 title=

In terms of the convenience of stress testing, using the MSE gateway stress testing scenario of Alibaba Cloud PTS, you can easily initiate stress testing on a specified gateway instance. Combined with the current limiting and observability capabilities of the MSE cloud native gateway, you can adjust the current limiting configuration while stress testing, observing, and realizing a one-stop high-availability protection system construction.

 title=

In terms of security capabilities, in addition to integrating the functions of WAF, the MSE cloud native gateway also provides a variety of authentication and security protection plug-ins in the plug-in market. Users can also use multiple languages (Golang/JS/Rust/C++, etc.) to write their own Wasm plug-ins to implement special traffic authentication and protection logic in their own business scenarios, and block abnormal traffic in advance before matching the current limiting rules to avoid Affect normal traffic access.

 title=

MSE cloud native gateway high availability protection in practice

Click to watch the live replay:

https://yqh.aliyun.com/live/detail/28697


阿里云云原生
1k 声望302 粉丝