Cloud Native Application Load Balancing Series (2): Ingress traffic distribution, fault tolerance and high-availability scheduling

introduction

In the first article of the Cloud Native Application Load Balancing series, "Cloud Native Application Load Balancing Selection Guide" introduces the ingress traffic management usage scenarios and solutions of the cloud native container environment. Envoy is used as the data plane proxy and Istio is used as the control plane. Istio Ingress Gateway is the best solution for cloud-native application traffic management in the comprehensive comparison of multi-cluster traffic distribution, security, observability, and heterogeneous platform support.

At the access layer, we need to configure routing rules, traffic behaviors (timeout, redirection, rewriting, retry, etc.), load balancing algorithms, circuit breakers, cross-domain traffic management rules, this article will be based on Istio Ingress Gateway for ingress traffic Distribution, fault tolerance, and high-availability scheduling introduces the principles and demonstrations of the above functions.

Component introduction

Istio Ingress Gateway, can be used as an Ingress Controller 12 , composed of the data plane (Envoy network proxy 1 ) and the control plane Istiod 13 . The Pod to Kubernetes cluster is deployed as a deployment by default.

Service discovery

The Istio Ingress Gateway control plane Istiod can connect to various service discovery systems (Kubernetes, Consul, DNS) to obtain endpoints information, or we can use ServiceEntry to manually add services and endpoints corresponding information. Taking the common Kubernetes cluster as an example, Istiod obtains Kubernetes Services and its corresponding endpoints from the API Server, and the corresponding relationship is monitored and automatically updated in real time. After obtaining the service discovery information, Istiod will convert it into the data plane standard data format, and push it to the Envoy network proxy on the data plane that actually performs the traffic forwarding operation in the form of Envoy xDS API.

It is worth noting that Envoy, the data plane of the Istio Ingress Gateway, is deployed in a Kubernetes cluster in the form of separate Pods. When only the Istio north-south traffic management capability is used, not need to inject the Istio data plane sidecar in the business Pods, and Envoy of the Ingress Gateway Pods can carry all the ingress traffic management capabilities, because most of the Istio ingress traffic management functions are implemented on the client side (Istio Ingress Gateway) of the .

Istio traffic management model and API introduction

Istio has designed its own traffic management API, which is mainly implemented through the CR (Kubernetes Custom Resource 2 ) of Gateway, VirtualService, and DestinationRule.

Gateway : Configure the monitoring rule . Including port, protocol, host, SSL, etc.
VirtualService : Configure routing rules . Including matching conditions, traffic behavior, routing destination service/version, etc.
DestinationRule : Configure service version and load balancing , connection pool , health check strategy .

We can interact with the API Server of the cluster where the Istio control plane is installed, submit the above CR, and configure traffic management rules. Istiod will interact with the API Server of the cluster to obtain the Istio API, convert these configurations into the Envoy data plane standard data format, and push to the data plane (Istio Ingress Gateway) through the xDS interface, and the data plane can forward traffic according to the corresponding rules.

Ingress traffic management practice

The following uses Istio Ingress Gateway as an example to introduce the practice of ingress traffic distribution, fault tolerance, and high availability scheduling.

Environmental preparation

Environment preparation can use the "one-click experience" function of the TCM console, which will automatically prepare the initial environment of TCM + 2 cross-AZ Kubernetes clusters + TCM demo for you.

We prepare an Istio Ingress Gateway (using Tencent Cloud Service Grid TCM demonstration) + Kubernetes cluster (using Tencent Cloud Container Service TKE demonstration) environment, first create a service grid instance, and create an Istio Ingress Gateway in the grid instance, and then Add a TKE cluster as the service discovery cluster of the grid.

Deploy TCM demo 3 in the cluster without injecting the envoy sidecar into the business Pod. The demo is an e-commerce website composed of 6 microservices developed in different languages. The following is the complete structure of the demo:

In this demonstration, the entrance traffic management will use the user, product, and cart three applications in the demo, and expose the API provided by it through the istio-ingressgateway for the client to call.

Ingress traffic distribution

Application release

The business needs to expose the APIs provided by multiple back-end modules for client calls, and need to configure gateway routing rules to route the traffic of the request path /product to the product service, route the request of /cart to the cart service, and route the request of /user Route to the user service.

In the TCM demo, the product service provides the /product interface to obtain the list of products on sale; the user service provides the /user interface to obtain user information; the cart service provides /cart interface to obtain the shopping cart product list. Next, we configure the Istio Ingress Gateway to expose the above interfaces according to the requested path.

1. First, we obtain the IP of Ingress Gateway through kubectl and configure it as a variable $INGRESS_HOST to facilitate subsequent direct reference.

$ export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

2. uses Gateway to configure Ingress Gateway monitoring rules, open port 80, HTTP protocol, and do not configure SSL for the time being.

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: apis-gw
spec:
  servers:
    - port:
        number: 80
        name: HTTP-80-6wsk
        protocol: HTTP
      hosts:
        - '*'
  selector:
    app: istio-ingressgateway
    istio: ingressgateway

3. use VirtualService configure routing rules, gateway parameters created in the previous step to fill default/apis-gw , HTTP routing matching rules to route requests / product to product service, / user is routed to the service user, / cart routed to cart services.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: apis-vs
spec:
  hosts:
    - '*'
  gateways:
    - default/apis-gw
  http:
    - match:
        - uri:
            exact: /product
      route:
        - destination:
            host: product.base.svc.cluster.local
    - match:
        - uri:
            exact: /user
      route:
        - destination:
            host: user.base.svc.cluster.local
    - match:
        - uri:
            exact: /cart
      route:
        - destination:
            host: cart.base.svc.cluster.local

4. uses curl verify the above configuration. The JSON string returned by the request API is jq extract the returned service information. The request has been routed to different services in a preset way.

$ curl http://$INGRESS_HOST/product | jq '.info[0].Service'
...
"product-v1"
$ curl http://$INGRESS_HOST/cart | jq '.Info[1].Service'
...
"cart-v1"
$ curl http://$INGRESS_HOST/user | jq '.Info[0].Service'
...
"user-v1"

Grayscale release

The business needs to upgrade the product service, and the preparation of the new version of the image has been completed, and the new version of pods can be deployed. It is hoped that the server needs to consider the smooth and risk-free iteration of the version when upgrading, and adjust the traffic according to the percentage to gradually switch to the new version of the service. After the grayscale verifies that the new version has no problems, the old version will be offline.

The following is the process of configuring Ingress Gateway for gray release of product services:

1. uses DestinationRule to define the version of the product service (subsets), and use tag key value matching to define the correspondence between subsets and endpoints in a service.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: product
  namespace: base
spec:
  host: product
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2
  exportTo:
    - '*'

2. uses VirtualService to configure the gray distribution routing strategy, and releases the initial v2 version of 0% of the traffic, and the v1 version of 100%.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: apis-vs
spec:
  hosts:
    - '*'
  gateways:
    - default/apis-gw
  http:
    - match:
        - uri:
            exact: /product
      route:
        - destination:
            host: product.base.svc.cluster.local
            subset: v2
          weight: 0
        - destination:
            host: product.base.svc.cluster.local
            subset: v1
          weight: 100
    - match:
        - uri:
            exact: /user
      route:
        - destination:
            host: user.base.svc.cluster.local
    - match:
        - uri:
            exact: /cart
      route:
        - destination:
            host: cart.base.svc.cluster.local

3. We deployed the v2 version of the TCM demo product service (deployment) 5 to the cluster, and then simulated 10 requests to call the product service. The returned result showed that all the traffic at the beginning of the release was still routed to the v1 version.

$ for((i=0;i<10;i++)) do curl http://$INGRESS_HOST/product | jq '.info[0].Service'; done
"product-v1"
"product-v1"
"product-v1"
"product-v1"
"product-v1"
"product-v1"
"product-v1"
"product-v1"
"product-v1"
"product-v1"

4. modify VirtualService and adjust the weight values of product v1 and v2 subset to: 50 and 50 respectively. The simulation initiates 10 requests to call the product service. As a result, as preset, the ratio of the number of calls of the v2 and v1 versions is close to 1:1.

$ for((i=0;i<10;i++)) do curl http://$INGRESS_HOST/product | jq '.info[0].Service'; done
"product-v1"
"product-v2"
"product-v1"
"product-v2"
"product-v1"
"product-v2"
"product-v1"
"product-v2"
"product-v1"
"product-v1"

5. After the grayscale verification is completed, modify the grayscale publishing routing rules, modify the VirtualService to adjust the v1 v2 subset weights respectively: 0, 100, and route 100% of the request/product traffic to the product v2 version. Once again, the simulation initiates 10 request verifications, all requests have been routed to product v2, and the gray release is completed.

$ for((i=0;i<10;i++)) do curl http://$INGRESS_HOST/product | jq '.info[0].Service'; done
"product-v2"
"product-v2"
"product-v2"
"product-v2"
"product-v2"
"product-v2"
"product-v2"
"product-v2"
"product-v2"
"product-v2"

Session retention

The same back-end service is generally carried by multiple instances (Pods). Usually, it is necessary to load balance the ingress traffic among multiple back-end instances (such as product services). At this time, the load is configured such as round robin, random or minimum number of connections. Balancing algorithm keeps multiple back-end instances to process traffic in a balanced manner.
In some specific cases, requests from the same user need to be routed to the same back-end instance to ensure that certain services that require session retention (such as the cart shopping cart service) can work properly.
There are two types of session retention:
Simple IP-based session persistence: requests from the same IP address are judged to be the same user and routed to the same back-end service instance. The implementation is simple and convenient, but it cannot support the scenario where multiple clients use a proxy to access back-end services. In this case, the same IP address does not represent the same user.
Session persistence based on cookie (or other 7-layer information): When the user requests for the first time, the edge gateway inserts a cookie for it and returns it, and subsequent clients use this cookie to access, and the edge gateway loads the traffic to the back-end service instance according to the cookie.

Istio Ingress Gateway can set session persistence based on IP, cookie, and HTTP header, but this policy is only effective for HTTP traffic. Next, you will configure the load balancing strategy of the IP and cookie sessions of the cart service.

If the open source Istio Ingress Gateway + CLB is used, the following problems exist when configuring edge gateway traffic management rules:

Manually modify the associated Service rule : After configuring the port of the Ingress Gateway using the Gateway, you need to manually configure the port rule of the LoadBalancer Service associated with the Ingress Gateway.
Unable to obtain the real source IP : When using the default container network mode, the istio-ingressgateway is exposed to the public network through the loadbalancer service. After the traffic passes through the CLB to the NodePort of the node, kube-proxy will do SNAT and DNAT for the original request. Therefore, when the request reaches the istio-ingressgateway, the source IP is no longer the real client IP.

Using TCM Ingress Gateway + TKE cluster can avoid the above problems:

TCM has realized automatic synchronization of Gateway port configuration to Ingress Gateway related Kubernetes Service and associated CLB. We can use Gateway CR to manage the ports of Ingress Gateway instances in a consistent manner.
By specifying istio-ingressgateway service of externalTrafficPolicy: Local to avoid traffic through NAT forwarding between nodes, retained the real client IP. At the same time, we can add the annotation service.kubernetes.io/local-svc-only-bind-node-with-pod: "true" to specify that the CLB backend is only bound to nodes with istio-ingressgateway Pod, to avoid the problem of health check failures caused by the backend bound to nodes that do not have Pod instances. You can also add the comment service.cloud.tencent.com/local-svc-weighted-balance: "true" allow CLB to perform weighted load balancing based on the number of Pods on the back-end node to avoid uneven load problems caused by the different number of Pod instances on different nodes 6 .

IP-based session maintenance :

1. After the Ingress Gateway can obtain the real client IP, we configure the simple IP-based load balancing of the cart service through the DestinationRule:

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: cart
  namespace: base
spec:
  host: cart
  trafficPolicy:
    loadBalancer:
      consistentHash:
        useSourceIp: true
  exportTo:
    - '*'

2. obtains the current pods of the cart service, a total of 3 pods are deployed.

$ kubectl get deployment cart -n base
NAME   READY   UP-TO-DATE   AVAILABLE   AGE
cart   3/3     3            3           4d23h

3. The simulates 10 requests/cart verifications to obtain the pod information that provides the shopping cart service. All requests are routed to the same pod, and the IP-based load balancing configuration is successful.

$ for((i=0;i<10;i++)) do curl http://$INGRESS_HOST/cart | jq '.Info[1].Pod'; done
"cart-855f9d75ff-x47bq"
"cart-855f9d75ff-x47bq"
"cart-855f9d75ff-x47bq"
"cart-855f9d75ff-x47bq"
"cart-855f9d75ff-x47bq"
"cart-855f9d75ff-x47bq"
"cart-855f9d75ff-x47bq"
"cart-855f9d75ff-x47bq"
"cart-855f9d75ff-x47bq"
"cart-855f9d75ff-x47bq"

Cookie-based session retention :

1. modify the DestinationRule of the cart service to configure cookie-based load balancing. The cookie name is cookie, and the cookie expiration time is 900000 ms (900 s).

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: cart
  namespace: base
spec:
  host: cart
  trafficPolicy:
    loadBalancer:
      consistentHash:
        httpCookie:
          name: cookie
          ttl: 900000ms
  exportTo:
    - '*'

2. initiates the first /cart request, obtains the cookie ID and pod information returned by the Ingress Gateway, uses the cookie ID to simulate 10 /cart requests, the first request and the subsequent 10 requests, the traffic is routed to For the same pod (cart-855f9d75ff-dqg6b in this example), the cookie-based load balancing configuration takes effect.

$ curl http://$INGRESS_HOST/cart -i
...
set-cookie: cookie="bc0e96c66ff8994b"; Max-Age=900; HttpOnly
...
Pod":"cart-855f9d75ff-dqg6b"
...

$ for((i=0;i<10;i++)) do curl http://$INGRESS_HOST/cart -b 'cookie=bc0e96c66ff8994b' | jq '.Info[1].Pod'; done
"cart-855f9d75ff-dqg6b"
"cart-855f9d75ff-dqg6b"
"cart-855f9d75ff-dqg6b"
"cart-855f9d75ff-dqg6b"
"cart-855f9d75ff-dqg6b"
"cart-855f9d75ff-dqg6b"
"cart-855f9d75ff-dqg6b"
"cart-855f9d75ff-dqg6b"
"cart-855f9d75ff-dqg6b"
"cart-855f9d75ff-dqg6b"

Fault tolerance

Connection pool management

The connection pool is an important configuration to keep the distributed system (service-oriented application) stable. When one of the services in a distributed system is at risk of failure due to a sudden increase in the number of requests, quickly returning failure information and applying pressure to downstream services as soon as possible can effectively avoid an avalanche of the entire system. We can configure the TCP/HTTP connection number/request number threshold for the services in need through the connection pool. When the threshold is reached, it will refuse to process new traffic and return an error message, which can effectively protect the stability of service operation 7 .

Below we configure the connection pool of the user service:

1. First, we deploy a set of curl pods (30) to simulate concurrent requests to the user service. Affected by the operating environment of each pod, the actual concurrency should be <30.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: curl
    qcloud-app: curl
  name: curl
  namespace: default
spec:
  replicas: 30
  selector:
    matchLabels:
      k8s-app: curl
      qcloud-app: curl
  template:
    metadata:
      labels:
        k8s-app: curl
        qcloud-app: curl
    spec:
      containers:
      - args:
        - -c
        - while true; do curl http://$INGRESS_HOST/user; done
        command:
        - /bin/sh
        image: docker.io/byrnedo/alpine-curl
        imagePullPolicy: IfNotPresent
        name: curl

2. uses DestinationRule to configure the connection pool of the user service. In order to facilitate the observation of the effect, we configure the maximum number of requests waiting for http1 to 1, and the maximum number of requests for http2 to 1.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: user
  namespace: base
spec:
  host: user
  trafficPolicy:
    connectionPool:
      http:
        http1MaxPendingRequests: 1
        http2MaxRequests: 1
  exportTo:
    - '*'

3. check the user service monitoring on the Grafana Dashboard, you can see that after the connection pool is configured, most of the requests are returned with the 503 Service Unavailable status code, so the client request success rate drops sharply, the user service overload configuration is successful, and the DestinationRule The connection pool configuration plays a very good role in protecting the stability of the server.

health examination

When a back-end service instance (Pod) fails in the process of processing traffic (continuously returning errors, the success rate drops below the threshold, etc.), Ingress Gateway needs to be able to configure a strategy to exclude failed endpoints from the healthy load balancing pool. Ensure that client calls can be processed by back-end service instances in a normal state.
In addition, the region-aware load balancing function also needs to turn on anomaly detection and perceive the health status of the endpoints in various places to determine the traffic scheduling strategy.

Outlier Detection of Ingress Gateway (envoy) is a passive health check. When traffic has behaviors like continuous 5xx errors (HTTP), connection timeout/failure (TCP), etc., it is recognized as an outlier from the load balancing pool. a period of time 160e7210bb8453 8 . Next, we configure the health check policy of the user service.

1. First, we deploy a set of pods that will return a 503 error for the request/user as unhealthy endpoints for the user service. After the deployment is complete, check the endpoint of the user service. There are 1 healthy user pod and 1 unhealthy user pod.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-unhealthy
  namespace: base
spec:
  replicas: 1
  selector:
    matchLabels:
      app: user
  template:
    metadata:
      labels:
        app: user
    spec:
      containers:
      - command:
        - sleep
        - "9000"
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: REGION
          value: shanghai-zone1
        image: docker.io/busybox
        imagePullPolicy: IfNotPresent
        name: user
        ports:
        - containerPort: 7000

$ kubectl get deployment -n base user user-unhealthy 
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
user             1/1     1            1           6d19h
user-unhealthy   1/1     1            1           3d20h

2. not yet configured the Outlier Detection (passive health check) of the user service. Unhealthy endpoints will not be removed from the load balancing pool. Therefore, the initiated request is partially successful (200 OK) and partially failed (503 Service Unavailable) .

$ for((i=0;i<10;i++)) do curl -I http://$INGRESS_HOST/user | grep HTTP/1.1; done
HTTP/1.1 200 OK
HTTP/1.1 200 OK
HTTP/1.1 503 Service Unavailable
HTTP/1.1 200 OK
HTTP/1.1 503 Service Unavailable
HTTP/1.1 200 OK
HTTP/1.1 503 Service Unavailable
HTTP/1.1 200 OK
HTTP/1.1 503 Service Unavailable
HTTP/1.1 503 Service Unavailable

3. We configure the DestinationRule of the user service and set the Outlier Detection. Statistics are performed every 10 seconds, and endpoints with a number of consecutive errors of 3 or more are excluded from the load balancing pool for 30 seconds. The maximum rejection rate is 100%, and the minimum health rate is 0%. After completing the configuration, we simulated requesting the user service, and all requests returned 200 OK. (Passive health check, unhealthy endpoints will be eliminated only after requesting to return consecutive errors)

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: user
  namespace: base
spec:
  host: user
  trafficPolicy:
    outlierDetection:
      consecutiveErrors: 3
      interval: 10000ms
      baseEjectionTime: 30000ms
      maxEjectionPercent: 100
      minHealthPercent: 0
  exportTo:
    - '*'

$ for((i=0;i<10;i++)) do curl -I http://$INGRESS_HOST/user | grep HTTP/1.1; doneHTTP/1.1 200 OKHTTP/1.1 200 OKHTTP/1.1 200 OKHTTP/1.1 200 OKHTTP/1.1 200 OKHTTP/1.1 200 OKHTTP/1.1 200 OKHTTP/1.1 200 OKHTTP/1.1 200 OKHTTP/1.1 200 OK

Redirect

When the application is migrated to a new URI and the original link needs to be kept available, the HTTP redirection 9 needs to be configured at this time. Redirection can be applied to the following scenarios:
Migrate to the new URI during server maintenance/downtime
Mandatory use of HTTPS protocol
Multiple domain names expand the application to cover the user population
To ensure that requests to access the back-end user service through the Ingress Gateway force the use of the more secure HTTPS protocol, you need to configure the HTTP redirection of the Ingress Gateway.

Next, we configure to force the use of HTTPS protocol redirection.

1. For use Gateway to configure HTTP redirection to force the use of HTTPS protocol.

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: apis-gw
spec:
  servers:
    - port:
        number: 80
        name: HTTP-80-h7pv
        protocol: HTTP
      hosts:
        - '*'
      tls:
        httpsRedirect: true
    - port:
        number: 443
        name: HTTPS-443-p1ph
        protocol: HTTPS
      hosts:
        - '*'
      tls:
        mode: SIMPLE
        credentialName: qcloud-$CERTIFICATE_ID
  selector:
    app: istio-ingressgateway
    istio: ingressgateway

2. uses HTTP to request /user and returns a 301 redirect status code. If it is accessed in a browser, it will re-initiate a new request to a new URI when the redirection is received.

$ curl http://$INGRESS_HOST/user -I | grep HTTP/1.1
HTTP/1.1 301 Moved Permanently

Rewrite

Using redirection, the client can perceive the change of the access address, and the redirected traffic actually initiates two requests before it can be accessed normally, which has a certain performance loss. The rewrite shields the client from the change of the address, and the server performs the rewrite operation completely, so that the client's request for the address is decoupled from the server's management.
The API resources provided by the cart service of the TCM demo have changed, and the /clear API to clear the shopping cart is implemented. It is hoped that the /cart request actually calls the /clear API without the client's perception.

Below we configure the request of /cart to be rewritten as /clear.

1. uses VirtualService to configure the /cart request to rewrite the URI to /clear before making the actual call.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: apis-vs
  namespace: default
spec:
  hosts:
    - 121.4.8.11
  gateways:
    - default/apis-gw
  http:
    - match:
        - uri:
            exact: /cart
      route:
        - destination:
            host: cart.base.svc.cluster.local
      rewrite:
        uri: /clear

2. initiates a /cart request. The client does not perceive the rewriting operation and is executed by the server. The actual call is the /clear API. The /clear API of the cart service successfully returns the message that the empty shopping cart is called successfully.

$ curl http://$INGRESS_HOST/cart -H 'UserID: 1'
{"Success":"true"}

High-availability scheduling

As the scale of the business increases, or the requirements for cross-availability zone/regional disaster recovery, data compliance, and business isolation are improved, the business will consider and implement multiple Kubernetes clusters and deploy the same service in cross-availability zones /Multiple clusters in a region, do high-availability scheduling. There are two main demands:

Automatic failover for region & error awareness : Determine the available zone/regional distribution strategy of traffic according to the service's regional information and endpoint health information. When the endpoint health is higher than the threshold, 100% of the traffic will be routed locally. When it is lower than the threshold, the traffic will be routed locally. Endpoint health automatically failover a certain percentage of traffic to other availability zones/regions, until all endpoints are unhealthy, 100% of the traffic will automatically failover to other availability zones/regions. Endpoint Health information perception depends on the ability of health checks.
Region-aware traffic distribution distribute : Automatically failover traffic not based on regions and error messages, administrators customize the configuration of cross-availability zone/regional multi-cluster traffic distribution strategy, for example, configure traffic from Shanghai 1st district to Shanghai 1st district and Shanghai 2nd district Distributed according to the ratio of 80% and 20%. There is no need to perceive the health of the endpoint, and it does not rely on the ability to check health.

Automatic failover for region & error perception

As the back-end scale of the TCM demo website increases, the demand for back-end service disaster recovery is also on the agenda. It is hoped to achieve cross-cluster disaster recovery for user services (this time taking the cluster cross-availability zone as an example), and a new business will be added in the second district of Shanghai. The user backup service is deployed in the cluster, and the traffic is still accessing users from the ingress gateway in the Shanghai area. I hope that when the user service endpoints in the Shanghai area are all healthy, visit the users nearby in this zone. When the health ratio of the user endpoints in the Shanghai area drops to a certain level (for example, 71.4%), start to transfer a certain percentage of traffic to Shanghai depending on their health. For the user endpoints in the second district, the traffic will be completely cut to the user backup in the second district of Shanghai until the health of the user endpoints in the first district of Shanghai completely drops to 0%.

Environmental preparation:

1. add a TKE cluster in the second zone of Shanghai (if you use the TCM one-click experience function to build an environment, you can skip it. TCM one-click experience has prepared a service discovery cluster in the second availability zone), and deploy it in this cluster The user service (replicas: 14) is used as a disaster recovery backup service for the user service in the Shanghai district.

apiVersion: v1
kind: Namespace
metadata:
  name: base
spec:
  finalizers:
    - kubernetes
---
apiVersion: v1
kind: Service
metadata:
  name: user
  namespace: base
  labels:
    app: user
spec:
  ports:
    - port: 7000
      name: http
  selector:
    app: user
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user
  namespace: base
  labels:
    app: user
spec:
  replicas: 14
  selector:
    matchLabels:
      app: user
  template:
    metadata:
      labels:
        app: user
    spec:
      containers:
        - name: user
          image: ccr.ccs.tencentyun.com/zhulei/testuser:v1
          imagePullPolicy: Always
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: REGION
              value: "shanghai-zone2"
          ports:
            - containerPort: 7000

2. adjusted the number of healthy and unhealthy pods of the cluster user service in the original Shanghai district to 10 and 4 respectively. After the adjustment is completed, the number of user service endpoints in Zone 1 and Zone 2 are as follows:

$ kubectl get deployment user user-unhealthy -n base
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
user             10/10   10           10          8d
user-unhealthy   4/4     4            4           5d2h

$ kubectl get deployment user -n base
NAME   READY   UP-TO-DATE   AVAILABLE   AGE
user   14/14   14           14          5d2h

Next, we are going to configure to enable and test the geographic-aware load balancing function of Istio Ingress Gateway.

1. TCM is enabled by default. We only need to configure the Outlier Detection of the user service, and the location awareness load balancing will take effect. And by default, when the health ratio of user endpoints in Shanghai district 1 drops to 10/14 (71.4%), traffic from district 1 to district 2 will be transferred proportionally.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: user
  namespace: base
spec:
  host: user
  trafficPolicy:
    outlierDetection:
      consecutiveErrors: 3
      interval: 10000ms
      baseEjectionTime: 30000ms
      maxEjectionPercent: 100
      minHealthPercent: 0
  exportTo:
    - '*'

2. current traffic of accessing user services from Ingress Gateway (Shanghai No. 1 District), and the current health ratio of users in Shanghai No. 1 district has not been less than the critical value of 10/14. Therefore, all traffic accessing user services is from users in Shanghai No. 1 district. Endpoints provided. A set of request verification is initiated, and all traffic is routed to the endpoints in the first district of Shanghai.

$ for((i=0;i<10;i++)) do curl http://$INGRESS_HOST/user | jq '.Info[0].Region'; done
"shanghai-zone1"
"shanghai-zone1"
"shanghai-zone1"
"shanghai-zone1"
"shanghai-zone1"
"shanghai-zone1"
"shanghai-zone1"
"shanghai-zone1"
"shanghai-zone1"
"shanghai-zone1"

3. adjusts the ratio of healthy and non-healthy endpoints for users in the cluster of Shanghai 1st district, adjusting the healthy endpoints to 5 and the unhealthy endpoints to 9. At this time, the healthy ratio of 5/14 of the user service in the first district of Shanghai is already less than 10/14, and some traffic should be cut to the second district of Shanghai, and a set of request verification is initiated, and part of the /user traffic is routed to the second district of Shanghai and routed to The ratio of Shanghai District 1 to District 2 is roughly equal to 1:1.

$ for((i=0;i<10;i++)) do curl http://$INGRESS_HOST/user | jq '.Info[0].Region'; done
"shanghai-zone2"
"shanghai-zone1"
"shanghai-zone1"
"shanghai-zone1"
"shanghai-zone1"
"shanghai-zone2"
"shanghai-zone2"
"shanghai-zone2"
"shanghai-zone1"
"shanghai-zone2"

4. Shanghai district. The healthy endpoints are 0 and the unhealthy endpoints are 14. At this time, the health ratio of users in Shanghai district 1 is 0%, and there is no back-end service. The ability to route all /user requests to Shanghai Second District. A set of request verification is initiated, and all traffic is routed to the second district of Shanghai.

$ for((i=0;i<10;i++)) do curl http://$INGRESS_HOST/user | jq '.Info[0].Region'; done
"shanghai-zone2"
"shanghai-zone2"
"shanghai-zone2"
"shanghai-zone2"
"shanghai-zone2"
"shanghai-zone2"
"shanghai-zone2"
"shanghai-zone2"
"shanghai-zone2"

Istio default policy is needed when traffic failover, the next will be in the same geographical priority all the global load balancing domain, if the same service than the deployment of two regions, you need to consider configuring failover priority, such as user service When deploying in Guangzhou, Shanghai, and Beijing, you need to configure when the health ratio of endpoints in Guangzhou drops below the threshold, the traffic failover area is the nearest Shanghai instead of the farther Beijing, or the global load balance between Shanghai and Beijing is used. The failover parameter of localityLbSetting can be configured. When the user service is deployed in Guangzhou, Shanghai, and Beijing, we can configure the inter-regional failover strategy as follows, and the configuration method is the same as the distribute strategy.

failover:
   - from: gz
     to: sh
   - from: sh
     to: gz
   - from: bj
     to: sh

TL;DR. The following content is a supplement to the background knowledge for the geographically aware load balancing to obtain the health of endpoints, geographical information, and determine the proportion of traffic transfer.
To achieve cross-availability zone/regional multi-cluster region-aware load balancing, and failover traffic proportionally based on service endpoints' geographic information and health level, the following capabilities are required:
multi-cluster network and discover the services of all the clusters:
After the cross-cluster network is connected, Istiod can obtain service and endpoint information from the API Server of multiple clusters and push it to the data plane proxy Envoy Pod.
2. Get the geographic location information of the service:
To achieve nearby access and disaster recovery, the geographic location information of the service is required. In the service grid, the geographic location consists of the following three levels of information: region, zone, and subzone. The information of region and zone comes from the topology.kubernetes.io/region label and topology.kubernetes.io/region label of the cluster node respectively. These two labels have been provided in the TKE cluster 10 . For example, the labels of the nodes in the first district of Shanghai are: topology.kubernetes.io/region: sh , topology.kubernetes.io/zone: "2000400001" ; the labels of the nodes in the second district of Shanghai are: topology.kubernetes.io/region: sh , topology.kubernetes.io/zone: "200002" .
There is no concept of subzone in Kubernetes. Istio introduced the topology.istio.io/subzone , which can be configured as required.
In the region-first load balancing strategy used by Istio by default, the priorities are as follows:
Priority 0 is the highest priority, same region and same zone;
Priority 1, different availability zones in the same region;
Priority 2 The lowest priority, different regions.
3. Get health information of service endpoints:
Endpoints health information acquisition depends on enabling Istio's health check: Outlier Detection. The region & error automatic failover function relies on the health check. When it is not enabled, the data plane cannot know the health status of the service endpoints, and the traffic load balance is performed in a global manner by default. Region-aware traffic distribution distribute does not rely on enabling health checks.
4. Determine the health status of the service & determine the traffic transfer ratio:
A service will deploy multiple copies. The health status of the service is not an absolute state of 0 and 1. The transfer of traffic is a gradual transfer process. When the unhealthy endpoints exceed a certain percentage, the traffic transfer will start gradually. Istio's geographic load balancing uses a geographic priority strategy by default, that is, the control plane tells the data plane to give priority to sending requests to the instance with the nearest geographic location when the data plane is in a healthy state. When the geographic priority of the highest availability zone/regional endpoints is 100%, All requests will be routed to this region and no traffic will be transferred. If the percentage of unhealthy endpoints exceeds a certain threshold, traffic will begin to transfer proportionally and gradually.
This threshold is controlled by envoy's overprovisioning factor, which is 1.4 by default. According to this factor and the health rate of service endpints, the traffic proportions of different geographic Priority Levels can be determined. For example, suppose that a certain service currently has endpoints in Guangzhou and Shanghai, and the traffic portal Ingress Gateway is deployed in Guangzhou. The traffic that accesses the service through the Guangzhou Ingress Gateway, according to the priority, the service in Guangzhou is P0 priority, and the service in Shanghai is P1 priority. Assuming that Shanghai is a disaster-tolerant region, the health ratio of endpoints is always 100%. Assuming that the weights of the two regions of Guangzhou and Shanghai are equal, the overprovisioning factor is the default value of 1.4. The calculation process of the traffic load ratio of the two regions is as follows:
Health ratio of service endpoints in Guangzhou : P0_health = number of health endpoints in Guangzhou / total number of endpoints in Guangzhou;
Guangzhou service traffic load ratio : P0_traffic = min(1, P0_health * 1.4);
Shanghai service traffic load ratio : P1_traffic = 1-P0_traffic = max(0, 1-P0_health * 1.4).
According to this calculation rule, when the overprovisioning factor is 1.4:
When the health ratio P0_health of the Guangzhou service endpoints is lower than 71.4%, the traffic from that region to the service will start to switch to the Shanghai region;
When the health rate of endpoints in Guangzhou is 50%, 1-50% * 1.4 = 30% of traffic will be transferred to services in Shanghai;
When the endpoints in the Guangzhou area are completely unavailable P0_health = 0%, the traffic will be completely switched to the Shanghai area.
PX_traffic = min(100, PX_health 1.4 100) Reflects the current traffic carrying capacity of a service in a certain region, which is called the health score in the Envoy community. When the sum of the health scores of all regions is less than 100, Envoy believes that the current health status is not fully capable of processing the request. At this time, Envoy will allocate the request according to the proportion of the health score. For example, when the health scores of Guangzhou and Shanghai are 20 and 30, respectively, they will bear 40% and 60% of the load 11 .

Geographically aware traffic distribution distribute

The business does not want traffic to automatically failover based on geographic and health information, but customize the traffic distribution strategy. The /user request from the Istio Ingress Gateway (Shanghai District 1) will be distributed evenly at the ratio of 1:1 in the first and second districts instead of Apply Istio's default automatic failover strategy: When 100% is healthy, 100% of the requests from the Shanghai area will be 100% load-balanced to the user endpoints in the Shanghai area.

You can configure a custom region-aware traffic distribution strategy through the distribute parameter of meshconfig (configure grid global) or DestinationRule (configure a single service).

1. restores the number of user service healthy/unhealthy endpoints in the first and second districts of Shanghai to the original state. A total of 14 endpoints in the first district are all healthy, and a total of 14 endpoints in the second district are all healthy. According to Istio Ingress Gateway's default geographic awareness strategy, all traffic from Ingress Gateway (Shanghai 1st District) to /users will be routed to the endpoints of Shanghai 1st District.

2. configures the DestinationRule of the user service, and customizes the traffic scheduling rules. The traffic from Shanghai District 1 is evenly routed to the endpoints of Shanghai District 1 and District 2.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: user
  namespace: base
spec:
  host: user
  trafficPolicy:
    loadBalancer:
      localityLbSetting:
        distribute:
          - from: sh/2000400001/*
            to:
              sh/2000400001/*: 50
              sh/200002/*: 50
        enabled: true
    outlierDetection:
      consecutiveErrors: 3
      interval: 10000ms
      baseEjectionTime: 30000ms
      maxEjectionPercent: 100
      minHealthPercent: 0
  exportTo:
    - '*'

3. initiates a set of /user requests for verification, and the traffic is routed to the endpoints in the first and second areas in a balanced manner, instead of the Istio Ingress Gateway default area/error-aware automatic failover (100% of the traffic is routed to Shanghai area one).

$ for((i=0;i<10;i++)) do curl http://$INGRESS_HOST/user | jq '.Info[0].Region'; done
"shanghai-zone2"
"shanghai-zone2"
"shanghai-zone2"
"shanghai-zone2"
"shanghai-zone1"
"shanghai-zone2"
"shanghai-zone1"
"shanghai-zone1"
"shanghai-zone2"
"shanghai-zone2"

Concluding remarks

This article introduces the technical principles and traffic management model of Istio Ingress Gateway traffic management. And from the three aspects of ingress traffic distribution, fault tolerance and high-availability scheduling, it demonstrated the functions of content routing, weighted routing, load balancing, circuit breakers, automatic failover for region & error awareness, and region-aware traffic distribution.

In addition to basic ingress traffic management, North-South traffic management also has scenarios such as security, observability, and heterogeneous service support, which will be discussed in a subsequent series of articles.

[Tencent Cloud Native] Yunshuo new products, Yunyan new technology, Yunyou Xinhuo, Yunxiang information, scan the QR code to follow the public account of the same name, and get more dry goods in time! !

Cloud Native Application Load Balancing Series (2): Ingress traffic distribution, fault tolerance and high-availability scheduling

introduction

Component introduction

Service discovery

Istio traffic management model and API introduction

Ingress traffic management practice

Environmental preparation

Ingress traffic distribution

Application release

Grayscale release

Session retention

Fault tolerance

Connection pool management

health examination

Redirect

Rewrite

High-availability scheduling

Automatic failover for region & error perception

Geographically aware traffic distribution distribute

Concluding remarks

账号已注销

引用和评论

Serverless AI绘画技术沙龙【深圳站】火热报名中

火热报名中| 第五届Light创造营邀你一起破茧成光！

2025版 RTC、直播、点播技术对比｜腾讯云/即构/声网如何选型

DeepSeek 从热潮到应用，腾讯云携手行业专家共探 AI 下一步

推理模型升级浪潮下，Agentic RAG 如何借力 DeepSeek 实现知识革命？

信息安全风云录，AI 时代安全江湖如何见招拆招？

2025免费云服务器盘点