Based on the instruction acceleration of the dragon lizard operating system, the construction cost of the cloud native gateway is reduced

technical background

The reliability, confidentiality and integrity requirements of network information transmission are increasing day by day, and the HTTPS protocol has been widely used. The SSL/TLS protocol of HTTPS involves cryptographic calculations such as encryption and decryption, verification, and signature, and consumes more CPU computing resources. Therefore, CPU hardware manufacturers have introduced a variety of acceleration offload solutions, such as AES-NI, QAT, KAE, ARMv8 security extension, etc.

The industry software ecosystem has also made a lot of explorations in optimizing the performance of HTTPS (refer to [1]). The traditional software optimization solutions include Session reuse, OCSP Stapling, False Start, dynamic record size, TLS1.3, HSTS, etc., but the software Level optimization cannot meet the increasing speed of traffic, and CPU hardware acceleration has become a common solution in the industry.

What's New in CPU

The third-generation Intel® Xeon® Scalable processor (codenamed Ice Lake) released not long ago has a 30% increase in single-core performance and a more than 50% increase in computing power (refer to [9]).

ISA instruction set

On the basis of traditional AES-NI acceleration instructions, Ice Lake adds Intel® Crypto Acceleration features based on Intel® Advanced Vector Extensions 512 (Intel® AVX-512), including Vector AES (VAES), Integer Fused Multiply Add (IFMA) Large Numbers Computation), Galois Field New Instructions (GFNI), etc. (refer to [2]). At the same time, it also supports the SHA extension (SHA-NI), which complements the performance shortcomings of the previous generation of SHA256.

title=

Figure 1 Ice Lake adds Intel AVX512 instruction set

Multi-buffer

Multi-buffer is a technique for centralizing batch requests for concurrent processing (refer to [10]). Application software uses SIMD vector instructions (AVX/AVX2/AVX512) to accelerate cryptographic computations through asynchronous SSL with multi-buffer library. OpenSSL (BabaSSL, BoringSSL), Nginx (Tengine), DPDK Cryptodev, dm-crypt, ISA-L and other open source software ecosystems have been partially supported.

title=

Figure 2 Multi-buffer with SIMD acceleration

dragon lizard operating system

Alibaba Cloud Operating System, through the cooperation of the Dragon Lizard community and Intel engineers (refer to [2]), is the first to support Ice Lake CPU on the cloud, and outputs the latest instruction acceleration features (refer to [10]). Ice Lake single-core AES performance is accelerated by 2.2x to 3.4x over the previous generation Cascade Lake. The RSA single-core speedup is 4.9 times, reaching 5.1 times the previous generation. Tengine (Nginx) single-core SSL/TLS handshake performance is correspondingly accelerated by 3.1 times, reaching 3.2 times that of the previous generation.

title=

Figure 3 System OpenSSL basic performance

title=

Figure 4 End-to-end application HTTPS TLS short connection performance (Tengine on the left, Nginx on the right)

Accelerated software stack

Benefiting from the previous QAT solution, mainstream software such as OpenSSL, Nginx (Tengine), DPDK, Envoy already supports SSL/TLS acceleration. The Intel® Crypto Acceleration solution uses and extends QAT's software framework, QAT Engine (refer to [2]), to extend it from a dedicated hardware accelerator card scenario to a general CPU instruction acceleration scenario, greatly expanding the scope of application.

title=

Figure 5 Intel SSL/TLS Acceleration Software Stack

Tengine Acceleration Solution

Alibaba's unified access gateway Tengine is responsible for all the ingress traffic of the group (refer to [1]). With the comprehensive advancement of HTTPS, the performance challenges of the gateway are also very great. Business drives technological innovation. In 2017, the access gateway also took the first step in the field of hardware acceleration and began to try the QAT card hardware acceleration solution. After the exploration and practice of Tengine QAT, Alibaba Cloud launched the MSE cloud-native gateway product built on the open source Envoy (refer to [5]).

Tengine supports async SSL since version 2.2.2, supports QAT accelerated SSL/TLS (refer to [3]), and supports the underlying lib versions: OpenSSL-1.1.0f and QAT_Engine-0.5.30 (refer to [4]). After the introduction of CPU acceleration, Dragon Lizard OS has been upgraded to: OpenSSL-1.1.1g and QAT_Engine-0.6.6, no QAT driver and qatlib are required.

Cloud Native Gateway Evolution

In 2018, Alibaba opened the prelude to cloud-native migration, using containers and service grids as core technology points to evolve, and tried Alibaba and Ant to unify their middleware technology stacks through this technology evolution, so that The business focuses more on business development and shields the underlying distributed complexity. As an important direction of service mesh, we have started the exploration of next-generation gateways.

traditional gateway

Traditional gateways are constructed by two layers of gateways: traffic gateways and business gateways (refer to [1]). Traffic gateways provide global policy configurations that are independent of backend services. For example, Tengine is a typical traffic gateway; business gateways provide independent business domains. Level, tightly coupled with the back-end business strategy configuration, with the evolution of the application architecture model from the monolith to the current distributed microservices, the business gateway also has a new name - microservice gateway.

title=

Figure 6 Traditional gateway

MSE Cloud Native Gateway

In the cloud-native era dominated by container technology and K8s, will the next-generation gateway model still be a two-tier architecture of traditional traffic gateway and microservice gateway? With this question, combined with the gateway technology and operation and maintenance experience accumulated in Alibaba, we try to answer, what is the next generation gateway.

title=

Figure 7 Product portrait of the next generation gateway

Let us explain some of the very core elements:

Cloud-native : To support standard K8s Ingress, K8s Gateway API and K8s service discovery, in the cloud-native era, K8s has become a cloud OS, and the network inside and outside the K8s native cluster is isolated, responsible for the entry of external traffic, the specification definition of K8s cluster It is K8s Ingress. K8s Gateway API is a further evolution of K8s Ingress. Based on this, as a next-generation gateway, it is bound to support this feature.

Embrace open source : To build a gateway based on the open source ecosystem, and help open source with the help of open source, I believe everyone should be familiar with this.

High expansion : The capabilities of any gateway cannot cover all user demands, and it is necessary to have scalability. For example, the vigorous development of K8s and its open expansion capabilities are indispensable.

Service governance : As the application architecture evolves to distributed microservices, the gateway itself provides traffic scheduling capabilities for back-end services, and it supports basic service governance capabilities.

Rich observability: While the distributed microservice architecture brings benefits such as improving collaborative efficiency, it also brings greater challenges to troubleshooting and operation and maintenance. The gateway as a traffic bridgehead needs to have rich observable data to help users to positioning problem.

Based on the above analysis and practice, we believe that in the cloud-native era dominated by containers and K8s, Ingress has become the gateway standard of the K8s ecosystem, giving the gateway a new mission and making it possible to combine the traffic gateway + microservice gateway into one.

The MSE cloud native gateway combines a traffic gateway and a microservice gateway (Ingress) into one. Through hardware acceleration, kernel tuning and other means, the resource cost of deploying a gateway can be reduced by 50% without sacrificing performance.

title=

Figure 8 Cloud native gateway

Advantages of MSE Cloud Native Gateway:

The gateway is directly connected to the service Pod IP without going through the traditional Cluster IP, and the RT is lower.
Support HTTPS hardware acceleration, QPS increased by 80%.
Support the Wasm plug-in market, plug-in hot-loading, to meet the user's multi-language custom plug-in demands.
The self-developed Multi-Ingress Controller component supports multiple cluster Ingress multiplexing the same gateway instance.
Natively compatible with the native K8s Ingress specification, and supports seamless conversion of Nginx Ingress core function annotations.

title=

Figure 9 Cloud native gateway technology architecture

Customer case sharing

Shanghai Fei Rui Network Technology Co., Ltd. has been using Nginx Ingress before, and encountered pain points such as high operation and maintenance costs, poor security, and weak native functions during use, and hoped to find an alternative product; after contacting the MSE cloud native gateway, it went online. In the previous test process, the HTTPS hardware acceleration function was very recognized, and the acceleration effect after the test verification was turned on was very obvious; combined with the Nginx Ingress annotation compatibility function + HTTPS hardware acceleration provided by the gateway, the user finally chose to use the MSE cloud native gateway. Replaces Nginx Ingress gateway.

title=

Figure 10 Fei Rui customer migration to cloud native gateway

business configuration

The MSE cloud native gateway has commercialized the HTTPS hardware acceleration function. It only needs to be enabled at the time of purchase. The schematic diagram after opening is as follows:

title=

Figure 11 Cloud native gateway enables hardware acceleration

acceleration effect

Before acceleration:

title=

After acceleration:

1C2G stress test HTTPS QPS increased from 1004 to 1873, an increase of about 86%.
The TLS handshake RT is doubled from 313.84ms to 145.81ms.

title=

Program advantages:

No independent dedicated hardware support is required, operation and maintenance costs are low, and it is easy to expand and shrink elastically.
General-purpose CPU acceleration features are applicable to a wider range of scenarios.

Reference link:

[1] https://developer.aliyun.com/article/870630

[2] https://openanolis.cn/sig/crypto/doc/390714951012679780

[3] https://tengine.taobao.org/changelog_cn.html

[4] http://tengine.taobao.org/document_cn/tengine_qat_ssl_cn.html

[5] * https://www.aliyun.com/product/aliware/mse*

[6] https://developer.aliyun.com/article/941940

[7] 3rd-Gen-Intel-Xeon-Scalable-Platform-Press-Presentation:

https://newsroom.intel.com/wp-content/uploads/sites/11/2021/04/3rd-Gen-Intel-Xeon-Scalable-Platform-Press-Presentation-281884.pdf

[8] https://01.org/kubernetes/solutions/QAT-envoy-solution

[9] https://developer.aliyun.com/article/783678

[10] crypto-acceleration-enabling-path-future-computing: