From concept, deployment to optimization, the implementation of Kubernetes Ingress gateway

Author: Fan Yang (Yang Shao)

Introduction to Kubernetes Ingress

Usually, the network environment inside the Kubernetes cluster is isolated from the outside, that is to say, clients outside the Kubernetes cluster cannot directly access the services inside the cluster. This is a question of how different network domains are connected. A common practice to address cross-domain access is to introduce an entry point to the target cluster, which must be accessed by all external traffic requesting the target cluster, and then the entry point forwards the external request to the target node.

Similarly, the Kubernetes community also solves the problem of how to expose the internal services of the cluster by adding entry points. The consistent style of Kubernetes is to solve the same kind of problems by defining standards, and it is no exception to solve the problem of external traffic management of clusters. Kubernetes further unified the abstraction of the cluster entry point, and proposed 3 solutions: NodePort, LoadBalancer and Ingress. Below is a comparison of the three options:

By comparison, it can be seen that Ingress is a way more suitable for business use, and can do more complex secondary routing distribution based on it, which is also the mainstream choice of users at present.

Kubernetes Ingress Status

Although Kubernetes standardizes and abstracts the way of cluster ingress traffic management, it only covers the basic HTTP/HTTPS traffic forwarding functions and cannot meet the large-scale and complex traffic governance problems of cloud-native distributed applications. For example, standard Ingress does not support more common traffic policies such as traffic diversion, cross-domain, rewriting, and redirection. There are two mainstream solutions to this problem. One is to extend by defining Key-Value in the Annotation of Ingress; the other is to use Kubernetes CRD to define new ingress traffic rules. As shown below:

Kubernetes Ingress Best Practices

This section will expand on Kubernetes Ingress best practices from the following 5 areas.

Traffic isolation: deploy multiple sets of IngressProvider to reduce the explosion radius
Grayscale publishing: How to use IngressAnnotation for grayscale publishing
Business Domain Splitting: How to Design APIs by Business Domain
Zero Trust: What is Zero Trust, Why You Need Zero Trust, and How to Do It
Performance Tuning: Some Practical Performance Tuning Methods

Traffic isolation

In actual business scenarios, the back-end services in the cluster need to provide services to external users or other internal clusters. Generally speaking, we refer to the traffic that accesses the interior as north-south traffic, and the traffic between internal services as east-west traffic. To save machine cost and O&M pressure, some users choose to share an Ingress Provider for north-south traffic and east-west traffic. This approach will bring new problems. It is impossible to perform refined traffic management for external traffic or internal traffic, and at the same time, the impact of faults will be expanded. The best practice is to deploy Ingress Provider independently for external network and internal network scenarios, and control the number of replicas and hardware resources according to the actual request scale, so as to reduce the explosion radius and maximize resource utilization.

grayscale release

During the continuous iterative development of the business, the application services of the business face the problem of frequent version upgrades. The most primitive and simple way is to stop the old version of the online service, and then deploy and start the new version of the service. This approach of delivering the new version of the service directly to all users poses two serious problems. First, during the period between stopping the old version of the service and starting the new version, the application service is unavailable, and the success rate of traffic requests drops to zero. Secondly, if there are serious program bugs in the new version, the operation of rolling back the new version to the old version will cause the service to be temporarily unavailable, which will not only affect the user experience, but also cause many unstable factors to the overall business system.

So, how can it not only meet the demands of rapid business iteration, but also ensure the high availability of business applications to the outside world during the upgrade process?

I think the following core issues need to be addressed:

How to reduce the impact of the upgrade?
How to quickly roll back to the stable version when there is a bug in the new version?
How to solve the defect that standard Ingress does not support traffic splitting?

In response to the first two issues, the common practice in the industry consensus is to use grayscale release, commonly known as canary release. The idea of canary release is to divert a small number of requests to the new version, so deploying the new version service only requires a very small number of machines. After verifying that the new version meets expectations, gradually adjust the traffic so that the traffic is slowly migrated from the old version to the new version. During this period, the services of the new version can be expanded according to the distribution of the current traffic on the new and old versions, and the services of the old version can be reduced at the same time. content, so that the underlying resources can be maximized.

In the Ingress status section, we mentioned two popular solutions for extending Ingress. The third problem can be solved by adding Key-Value to Annotation. We can define the policy configuration required for grayscale publishing in Annotation, such as configuring the Header, Cookie and the matching method of the corresponding value (exact matching or regular matching) for grayscale traffic. Afterwards, the Ingress Provider identifies the newly defined Annotation and parses it into its own routing rules, that is, the key is that the Ingress Provider selected by the user should support rich routing methods.

Grayscale release - according to Header grayscale

In the process of verifying whether the new version of the service meets expectations with small traffic, we can selectively consider the traffic with some characteristics online as small traffic. Header and Cookie in the request content can be considered as request features, so for the same API, we can segment the online traffic according to Header or Cookie. If there is no difference in the headers in the real traffic, we can manually create some traffic with grayscale headers based on the online environment for verification. In addition, we can also verify the new version in batches according to the importance of the client. For example, the access request of ordinary users will be given priority to access the new version, and after the verification is completed, VIP users will be gradually drained. Generally, these user information and client information will be stored in cookies.

Taking Nginx-Ingress as an example, the traffic distribution of Ingress is supported through Annotation. The schematic diagram according to the Header grayscale is as follows:

Grayscale release - grayscale by weight

According to the grayscale of the header, a new version of the service can be provided for a specific request or user, but the scale of the request to access the new version cannot be well evaluated, so it may not be possible to maximize the utilization of resources when allocating machines for the new version. The grayscale method according to the weight can accurately control the flow ratio, and it is easy to allocate machine resources. After passing the small traffic verification in the early stage, the version upgrade is gradually completed by adjusting the traffic weight in the later stage. This method is simple to operate and easy to manage. However, online traffic will be directed to the new version indiscriminately, which may affect the experience of important users. The schematic diagram according to the weight gray scale is as follows:

business domain split

As the scale of cloud-native applications continues to expand, developers begin to fine-grained splitting the original monolithic architecture, splitting the service modules in the monolithic application into microservices that are deployed and run independently, and these microservices are The corresponding business team is solely responsible for the life cycle, which effectively solves the problems of insufficient agility and low flexibility in the monolithic architecture. But no architecture is a silver bullet. Solving old problems will inevitably introduce new ones. Monolithic applications can complete external exposure services through a four-layer SLB, while distributed applications rely on Ingress to provide seven-layer traffic distribution capabilities. At this time, how to better design routing rules is particularly important.

Usually we split services according to business domain or functional domain, so we can also follow this principle when exposing services through Ingress. When designing external APIs for microservices, you can add a representative business prefix to the original Path. After the request completes route matching and before forwarding the request to the backend service, the Ingress Provider completes the elimination of the business prefix by rewriting the Path. The work flow chart is as follows:

The API design principle is easy to manage the set of exposed services, perform more fine-grained authentication and authentication based on business prefixes, and facilitate the unified observable construction of services in each business domain.

zero trust

Security issues are always public enemy number one for business applications, and they accompany the entire life cycle of business development. In addition, the external Internet environment is becoming more and more complex, the internal business structure is increasingly large, and the deployment structure involves various forms of public cloud, private cloud and hybrid cloud, and security problems are becoming more and more serious. Zero trust was born as a new design model application in the security field. It believes that all users and services inside and outside the application network are untrustworthy. They must go through identity authentication before initiating and processing requests, and all authorization operations follow the principle of least privilege. Simply put, trust no-one, verify everything.

The following figure is the architecture diagram of the whole end-to-end zero trust concept of external users -> Ingress Provider -> back-end services:

External users and Ingress Provider. The external user completes the authentication by verifying the certificate provided by the Ingress Provider to the authoritative certificate authority; the Ingress Provider completes the authentication and authentication by providing the JWT certificate for the external user.
Ingress Provider and backend services. The Ingress Provider completes identity authentication by verifying the certificate provided by the back-end service to the internal private certificate authority, and the back-end service completes the identity authentication by verifying the certificate provided by the Ingress Provider to the internal private certificate. Authentication operation.

performance tuning

All external access traffic needs to pass through the Ingress Provider first, so the main performance bottleneck is the Ingress Provider, which has higher requirements for high concurrency and high performance. Aside from the performance differences between each Ingress Provider, we can further release performance by adjusting the kernel parameters. After Alibaba's years of practical experience in the cluster access layer, we can appropriately adjust the following kernel parameters:

Increase the capacity of the TCP connection queue: net.core.somaxconn
Increase the available port range: net.ipv4.ip_local_port_range
Reuse TCP connection: net.ipv4.tcp_tw_reuse

Another optimization angle is to start from the hardware and fully release the underlying hardware computing power to further improve the performance of the application layer. At present, HTTPS has become the main way to use public network requests. After all use HTTPS, because it has to do TLS handshake, there will be a lot of performance loss compared to HTTP. At present, with the substantial improvement of CPU performance, the use of the CPU's SIMD mechanism can greatly accelerate the performance of TLS. This optimization scheme relies on the support of the machine hardware and the support of the internal implementation of the Ingresss Provider.

At present, the MSE cloud native gateway based on the Istio-Envoy architecture combined with Alibaba Cloud's seventh-generation ECS has taken the lead in completing TLS hardware acceleration, greatly improving the performance of HTTPS without increasing user resource costs.

New Choice for Ingress Provider - MSE Cloud Native Gateway

With the continuous evolution of cloud-native technology and the deepening of cloud-native application microservices, Nginx Ingress is facing complex routing rule configuration, support for multiple application layer protocols (Dubbo and QUIC, etc.), service access security, and traffic observability. Slightly exhausted on issues such as sex. In addition, Nignx Ingress uses the Reload method to take effect when dealing with configuration updates. In the face of large-scale long connections, there will be flash interruptions. Frequent configuration changes may result in loss of business traffic.

In order to solve the strong demands of users for large-scale traffic management, the MSE cloud native gateway came into being. This is a next-generation gateway compatible with the standard Ingress specification launched by Alibaba Cloud. Combining the traditional WAF gateway, traffic gateway and microservice gateway, it provides users with refined traffic governance capabilities while reducing resource costs by 50%, and supports ACK container services, Nacos, Eureka, fixed addresses, FaaS and other services The discovery method supports multiple authentication and login methods to quickly build a security defense line, provides a comprehensive and multi-perspective monitoring system, such as indicator monitoring, log analysis, and link tracking, and supports parsing of standard Ingress resources in single and multiple Kubernetes cluster modes. To help users perform unified traffic management in a declarative manner in cloud-native application scenarios, we also introduced the WASM plug-in market to meet user customization needs.

Nginx Ingress VS MSE Cloud Native Gateway

The following is a comparison summary of Nginx Ingress and MSE cloud native gateway:

smooth migration

The MSE cloud native gateway is hosted by Alibaba Cloud, which is free of operation and maintenance, reduces costs, has rich functions, and is deeply integrated with Alibaba Cloud peripheral products. The following figure shows how to seamlessly migrate from Nginx Ingress to MSE cloud native gateway. Other Ingress Providers can also refer to this method.

Hands

Next, we will perform practical operations related to Ingress Provider, the MSE cloud native gateway, based on Alibaba Cloud Container Service ACK. You can learn how to manage cluster ingress traffic through MSE Ingress Controller.

Operation document address:

https://help.aliyun.com/document_detail/426544.html

Preconditions

Install MSE Ingress Controller

We can find ack-mse-ingress-controller in the application market of Alibaba Cloud Container Service, and complete the installation according to the operation document under the component.

Create MSE Cloud Native Gateway by CRD

MseIngressConfig is a CRD resource provided by MSE Ingress Controller. MSE Ingress Controller uses MseIngressConfig to manage the life cycle of MSE cloud native gateway instance. One MseIngressConfig corresponds to one MSE cloud native gateway instance. If you need to use multiple MSE cloud native gateway instances, you need to create multiple MseIngressConfig configurations. For simplicity of presentation, we create the gateway with a minimal configuration.

 apiVersion: mse.alibabacloud.com/v1alpha1
kind: MseIngressConfig
metadata:
  name: test
spec:
  name: mse-ingress
  common:
    network:
      vSwitches:
        - "vsw-bp1d5hjttmsazp0ueor5b"

Configure the Kubernetes standard IngressClass to associate with MseIngressConfig. After the association is completed, the cloud native gateway will start to monitor the Ingress resources related to the IngressClass in the cluster.

 apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  annotations:
    ingressclass.kubernetes.io/is-default-class: 'true'
  name: mse
spec:
  controller: mse.alibabacloud.com/ingress
  parameters:
    apiGroup: mse.alibabacloud.com
    kind: MseIngressConfig
    name: test

We can check the current status by looking at the status of MseIngressConfig. MseIngressConfig will change according to the status of Pending > Running > Listening. Each status is described as follows:

Pending: Indicates that the cloud native gateway is being created and needs to wait for about 3 minutes.
Running: Indicates that the cloud native gateway is successfully created and is running.
Listening: Indicates that cloud native is running and listening to Ingress resources in the cluster.
Failed: Indicates that the cloud native gateway is in an illegal state. You can view the Message in the Status field to further clarify the reason.

Grayscale publishing practice

Assuming that the cluster has a backend service httpbin, we hope to perform grayscale verification according to the header when the version is upgraded, as shown in the figure:

First deploy the httpbin v1 and v2 versions, and apply the following resources to the ACK cluster:

 apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-httpbin-v1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: go-httpbin-v1
  template:
    metadata:
      labels:
        app: go-httpbin-v1
        version: v1
    spec:
      containers:
        - image: specialyang/go-httpbin:v3
          args:
            - "--port=8090"
            - "--version=v1"
          imagePullPolicy: Always
          name: go-httpbin
          ports:
            - containerPort: 8090
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-httpbin-v2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: go-httpbin-v2
  template:
    metadata:
      labels:
        app: go-httpbin-v2
        version: v2
    spec:
      containers:
        - image: specialyang/go-httpbin:v3
          args:
            - "--port=8090"
            - "--version=v2"
          imagePullPolicy: Always
          name: go-httpbin
          ports:
            - containerPort: 8090
---
apiVersion: v1
kind: Service
metadata:
  name: go-httpbin-v1
spec:
  ports:
    - port: 80
      targetPort: 8090
      protocol: TCP
  selector:
    app: go-httpbin-v1
---
apiVersion: v1
kind: Service
metadata:
  name: go-httpbin-v2
spec:
  ports:
    - port: 80
      targetPort: 8090
      protocol: TCP
  selector:
    app: go-httpbin-v2

Ingress resources to release stable v1:

 apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: httpbin
spec:
  ingressClassName: mse
  rules:
  - host: test.com
    http:
      paths:
      - path: /version
        pathType: Exact
        backend:
          service:
            name: go-httpbin-v1
            port: 
              number: 80

Publish the Ingress resource of the grayscale version v2:

 apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    "nginx.ingress.kubernetes.io/canary": "true"
    "nginx.ingress.kubernetes.io/canary-by-header": "stage"
    "nginx.ingress.kubernetes.io/canary-by-header-value": "gray"
  name: httpbin-canary-header
spec:
  ingressClassName: mse
  rules:
  - host: test.com
    http:
      paths:
      - path: /version
        pathType: Exact
        backend:
          service:
            name: go-httpbin-v2
            port: 
              number: 80

Test verification

 # 测试稳定版本
curl -H "host: test.com" <your ingress ip>/version

# 测试结果
version: v1

# 测试灰度版本
curl -H "host: test.com" -H "stage: gray" <your ingress ip>/version

# 测试结果
version: v2

The above is the way we use Ingress Annotation to extend the high-level traffic management capabilities of the standard Ingress to support grayscale publishing.

write at the end

MSE - Cloud Native Gateway, aims to provide users with more reliable, lower cost and more efficient enterprise-level gateway products that comply with the Kubernetes Ingress standard. For more release details, move to the live broadcast room to watch:

https://yqh.aliyun.com/live/detail/28477

MSE - Cloud native gateway provides two payment modes: post-paid and annual and monthly subscription, and supports 10 regions including Hangzhou, Shanghai, Beijing, Shenzhen, Zhangjiakou, Hong Kong, Singapore, the United States (Virginia), the United States (Silicon Valley), and Germany (Frankfurt). And will gradually open other regions, the cloud native gateway purchase link is here.

Buy MSE cloud native gateway prepaid full specification now and enjoy 30% discount, both old and new.

You can also search the group number 34754806 on DingTalk to join the user group to communicate and answer questions.

Click here to go to the MSE official website to grab it!

From concept, deployment to optimization, the implementation of Kubernetes Ingress gateway

Introduction to Kubernetes Ingress

Kubernetes Ingress Status

Kubernetes Ingress Best Practices

Traffic isolation

grayscale release

Grayscale release - according to Header grayscale

Grayscale release - grayscale by weight

business domain split

zero trust

performance tuning

New Choice for Ingress Provider - MSE Cloud Native Gateway

Nginx Ingress VS MSE Cloud Native Gateway

smooth migration

Hands

Preconditions

Install MSE Ingress Controller

Create MSE Cloud Native Gateway by CRD

Grayscale publishing practice

write at the end

阿里云云原生

引用和评论

Higress 入选全球 Top 100 MCP Servers 榜单｜MCPMarket.com

K8s 小白入门｜从电影配乐谈起，聊聊容器编排和 K8s

支付宝H5下载被拦截的原因排查与解决指南

云上玩转DeepSeek系列之四：DeepSeek R1 蒸馏和微调训练最佳实践

终于，AWS Aurora 也走向了融合架构，这一次阿里云 PolarDB-X 确实遥遥领先

MCP协议重大升级，Spring AI Alibaba联合Higress发布业界首个Streamable HTTP实现方案

容器化对数据库的性能有影响吗？