Author: Yin Hang

As the industry's first fully managed Istio-compatible Alibaba Cloud service mesh product, ASM has maintained consistency with the community and industry trends in its architecture from the very beginning. Cluster independent. ASM products are customized and implemented based on community Istio, and provide component capabilities to support refined traffic management and security management on the managed control plane side. Through the managed mode, the lifecycle management of the Istio components and the managed K8s cluster is decoupled, making the architecture more flexible and improving the scalability of the system. From April 1, 2022, Alibaba Cloud Service Mesh ASM will officially launch a commercial version, providing richer capabilities, larger scale support and better technical support to better meet customers' different demand scenarios. For details, please click the original link.

foreword

A service mesh is a platform that simplifies service governance through the "Sidecar" model. The entire service mesh can be divided into a "control plane" that includes the core component Istiod, and a "data plane" that includes the sidecars for each service. If you have used service mesh, I believe you have a little understanding of the above concepts.

In the service mesh Istio, we know that each Sidecar is an envoy application with a complete configuration including listener, route, cluster, secret and other parts; envoy changes its own behavior according to these configurations and implements different traffic control means. The main task of the control plane component is to "translate" resources such as VirtualService and DestinationRule in the grid into the specific configuration of the envoy, and send the configuration to each envoy on the data plane. We call this process "configuration push".

In envoy, the cluster configuration corresponds to the concept of "cluster" within envoy. A cluster refers to "a group of logically identical upstream hosts connected by Envoy". In fact, most of the time, it corresponds to a specific service.

If we actually look at the complete configuration of envoy, we will find that when we do nothing, envoy already contains some dynamically distributed clusters, that is to say, Sidecar automatically discovers and includes the service information in the cluster. Here comes a core question: how is the "service discovery" of the service mesh accomplished? We usually use "in-grid service" to refer to the corresponding Pod-injected sidecar service; then, does the service grid automatically add the sidecar-injected "in-grid service" to the cluster configuration of each sidecar? ?

Today's article will mainly talk about "service discovery" and "configuration push" of service mesh.

"Service Discovery" for Service Meshes

A simple little experiment can answer these questions. We install the service mesh platform in the Kubernetes cluster (you can use aliyun ASM + ACK to quickly build the service mesh platform), and then create a test namespace in addition to the default namespace.

Then, we enable sidecar automatic injection for the default namespace, that is, the services in the default namespace will automatically become "in-grid" services, and the services in the test namespace will not be injected into the sidecar, that is, "net" "Extraordinary" service.

In service mesh ASM, we can easily turn on and off automatic injection in the "global namespace" management of the console (don't forget to synchronize automatic injection after turning it on/off):

 title=

We can deploy whatever services we like in these two namespaces at will, here I deployed a sleep service to the default namespace. This service is literally, there is a curl container inside, and it has been sleeping all the time ( ̄o ̄).zZ , you can enter the Pod of this service to easily curl other services.

In Istio's official community github repository, you can find the YAML file for the sleep sample service: istio/samples/sleep/sleep.yaml, we can copy this file and execute:

 kubectl apply -f sleep.yaml

In the same way, in the test namespace, we can deploy any service. I have deployed an old friend bookinfo of Istio users here. We can also find its YAML file in Istio's official github: istio/samples/bookinfo/ platform/kube/bookinfo.yaml, copy it and execute:

 kubectl apply -f bookinfo.yaml -n test

By the way, in order not to make sleep too lonely, I also deployed an httpbin application to accompany him in the default namespace, but it doesn't matter if I don't deploy it ┐(゚~゚)┌.

We're ready now, and you can probably guess what's going to happen next. Let's take a look - what cluster configurations are in this Sidecar.

If you use Istio, you can use the istioctl command line tool to easily view the summary information of each configuration in the sidecar; this may not work in the service mesh ASM, but ASM also has a partially compatible command line tool asmctl, we You can enter the Alibaba Cloud Help Center, search for asmctl, find "Install and use the diagnostic tool asmctl", and download and install this tool according to the instructions in the documentation.

Taking asmctl as an example, you can use this gesture to view the cluster configuration inside the Sidecar (the Kubeconfig of the data plane cluster needs to be configured in advance):

 $ asmctl proxy-config cluster sleep-557747455f-g4lcs 
SERVICE FQDN                                                      PORT      SUBSET     DIRECTION     TYPE             DESTINATION RULE
                                                                  80        -          inbound       ORIGINAL_DST     
BlackHoleCluster                                                  -         -          -             STATIC           
InboundPassthroughClusterIpv4                                     -         -          -             ORIGINAL_DST     
InboundPassthroughClusterIpv6                                     -         -          -             ORIGINAL_DST     
PassthroughCluster                                                -         -          -             ORIGINAL_DST     
agent                                                             -         -          -             STATIC           
asm-validation.istio-system.svc.cluster.local                     443       -          outbound      EDS              
controlplane-metrics-aggregator.kube-system.svc.cluster.local     443       -          outbound      ORIGINAL_DST     
details.test.svc.cluster.local                                    9080      -          outbound      EDS              
envoy_accesslog_service                                           -         -          -             STRICT_DNS       
heapster.kube-system.svc.cluster.local                            80        -          outbound      EDS              
httpbin.default.svc.cluster.local                                 8000      -          outbound      EDS              
istio-ingressgateway.istio-system.svc.cluster.local               80        -          outbound      EDS              
istio-ingressgateway.istio-system.svc.cluster.local               443       -          outbound      EDS              
istio-sidecar-injector.istio-system.svc.cluster.local             443       -          outbound      EDS              
istio-sidecar-injector.istio-system.svc.cluster.local             15014     -          outbound      EDS              
istiod.istio-system.svc.cluster.local                             15012     -          outbound      ORIGINAL_DST     
kiali.istio-system.svc.cluster.local                              20001     -          outbound      EDS              
kube-dns.kube-system.svc.cluster.local                            53        -          outbound      EDS              
kube-dns.kube-system.svc.cluster.local                            9153      -          outbound      EDS              
kubernetes.default.svc.cluster.local                              443       -          outbound      EDS              
metrics-server.kube-system.svc.cluster.local                      443       -          outbound      EDS              
productpage.test.svc.cluster.local                                9080      -          outbound      EDS              
prometheus_stats                                                  -         -          -             STATIC           
ratings.test.svc.cluster.local                                    9080      -          outbound      EDS              
reviews.test.svc.cluster.local                                    9080      -          outbound      EDS              
sds-grpc                                                          -         -          -             STATIC           
sleep.default.svc.cluster.local                                   80        -          outbound      EDS              
storage-crd-validate-service.kube-system.svc.cluster.local        443       -          outbound      EDS              
storage-monitor-service.kube-system.svc.cluster.local             11280     -          outbound      EDS              
xds-grpc                                                          -         -          -             STATIC           
zipkin

There is an interesting thing here. Although the whole set of bookinfo applications has not injected sidecar, we can still find the service information of bookinfo applications such as productpage, reviews, ratings and so on in the sidecar of sleep.

How is all this accomplished? In fact, the Istio official explained this process in the community article "Traffic Management":

In order to direct traffic within your mesh, Istio needs to know where all your endpoints are, and which services they belong to. To populate its own service registry, Istio connects to a service discovery system. For example, if you've installed Istio on a Kubernetes cluster, then Istio automatically detects the services and endpoints in that cluster.

Briefly, service mesh Istio does not do service discovery. All services are discovered through the services of the underlying platform of the service grid (Kubernetes, Consul, etc., Kubernetes in most cases), and then transferred to the grid's own service registry (that is, the sidecar's one) after being adapted by the control plane. heap cluster is configured).

Also, if you look at the endpoints recorded in the sidecar, you will find that the ip addresses of all pods in the Kubernetes cluster are recorded regardless of whether the sidecar is injected or not.

Try deploying the following VirtualService in the test namespace:

 apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  namespace: test
  name: test-vs
spec:
  hosts:
    - productpage.test.svc.cluster.local
  http:
    - match:
        - uri:
            prefix: /test/
      rewrite:
        uri: /
      route:
        - destination:
            host: productpage.test.svc.cluster.local
            port:
              number: 9080

This VirtualService implements a uri rewrite, but the target host is an "off-grid" service productpage.

Try it, curl a path that will be rewritten /test/productpage

 kubectl exec -it sleep-557747455f-g4lcs -c sleep -- curl productpage.test:9080/test/productpage

You will find that the rewriting takes effect, and the request returns normally.

The above behavior can show that the so-called "inside and outside the grid" is only a means of distinction determined by whether to inject Sidecar, it does not mean that the grid itself and services outside the grid have strict isolation boundaries. In the above example, VirtualService actually takes effect in the route configuration of the request sender Sidecar, and even if the productpage service does not have a Sidecar, it will be discovered by the service mesh control plane and added to the cluster configuration, with a corresponding cluster . Therefore, the grid can of course set routing rules for this cluster. As long as the Pod of the request sender is injected into the Sidecar, the VirtualService function can work normally.

Of course, in actual use, other resources of the service mesh may take effect in the inbound traffic configuration, and the request receiver must also inject the sidecar. But sidecar injection does not play a decisive role in the service discovery of the grid, that's for sure.

"Config Push" for Service Meshes

Above we explored the "service discovery" mechanism of the service grid, which can be said to be very ingenious. This mechanism not only saves the service grid from implementing a redundant service discovery mechanism, but also facilitates the grid itself and different Connect to the underlying platform. However, a closer look reveals problems and hidden counterintuitive realities.

Take the application we deployed in the Kubernetes cluster in the above test as an example. The Sleep application and the bookinfo application are two independent applications, and they have nothing to do with each other. In the actual production environment, I believe that many users will have this practice: using the isolation mechanism of the Kubernetes namespace, deploy multiple applications in the same cluster to provide external services, each application contains several microservices, and the applications The relationship with each other is relatively independent. And only some of the applications need to use the governance capabilities of the service grid (multi-language service interoperability, blue-green release, etc.), so Sidecar is injected.

The problem is that the control plane of the service mesh also cannot assume the relationship between user services, so it still needs to watch all services and endpoints in the cluster equally :-(

What's more uncomfortable is that the control plane also needs to synchronize the latest service information in the cluster to each sidecar on the data plane in time, which leads to the following "thankful" situation:

1. At the beginning, everyone was in peace, and the years were quiet

 title=

2. Service2 has been expanded!

 title=

3. Istiod issues a new configuration

 title=

4. An embarrassing thing happened. Service1 and Service2 are actually two independent applications. The Sidecar in the grid does not need to record any information of Service2.

 title=

If there are not many services in the grid, this can't cause any huge problems, it's just that the control plane components of the grid do more useless work. However, as the so-called quantitative change leads to qualitative change, the accumulation of useless work will become a big problem that cannot be ignored. If you have hundreds of services deployed in your cluster, and only a few of them join the grid, the control plane of the grid will be overwhelmed with a lot of invalid information, the load on the control plane components will be high, and Keep pushing all the sidecars with information they don't need.

Since the control plane component is no more than a gateway and is only responsible for pushing configuration to the sidecar, it may be difficult for service mesh users to notice that the load on the control plane is too high. However, once the load is too high, the Pilot SLB exceeds the limit or the control plane components are restarted, etc., the user will face the dilemma that the new routing configuration is pushed too slowly, and the Pod injected into the Sidecar cannot get up. Therefore, while enjoying the convenient traffic management capabilities brought by the service grid, it is also necessary to appropriately care about the physical and mental health of the service grid control plane and optimize the configuration push of the control plane.

Configuring Push Optimization - Selective Service Discovery

In response to the situation described above, we can manually configure the service mesh, so that the control plane of the mesh only "discovers" services in a specific namespace, and ignores other namespaces. In the community, Istio has provided a "discovery selector" capability since version 1.10. The service mesh ASM also provides the capability of "selective service discovery", and the effective mechanism of the two is exactly the same.

Let's take "Selective Service Discovery" of service mesh ASM as an example. First, put a specific label on the namespace where the application in the grid is located to distinguish the namespace where the application in the grid and outside the grid are located (note that the namespace of the data plane is labeled):

 # 在这里,default命名空间开启了自动注入,属于“网格内”命名空间
kubectl label namespace default in-mesh=yes
# 其实我们也可以直接用 istio-injection:enabled标签,不用再打一个

Go to our ASM instance management page and select "Configure Push Optimization -> Selective Service Discovery" in the left menu.

In the "Selective Service Discovery" page, select "Add Label Selector" -> "Add Label Exact Match Rule", and enter the label we just typed.

 title=

Then click "OK", it's very convenient. After this the grid goes through a short update phase, updating the configuration we just wrote down, and then re-enters the "running" state.

Let's run our first command again and look at the cluster configuration in the sidecar of the sleep service:

 $ asmctl proxy-config cluster sleep-557747455f-g4lcs   
SERVICE FQDN                             PORT     SUBSET     DIRECTION     TYPE             DESTINATION RULE
                                         80       -          inbound       ORIGINAL_DST     
BlackHoleCluster                         -        -          -             STATIC           
InboundPassthroughClusterIpv4            -        -          -             ORIGINAL_DST     
InboundPassthroughClusterIpv6            -        -          -             ORIGINAL_DST     
PassthroughCluster                       -        -          -             ORIGINAL_DST     
agent                                    -        -          -             STATIC           
envoy_accesslog_service                  -        -          -             STRICT_DNS       
httpbin.default.svc.cluster.local        8000     -          outbound      EDS              
kubernetes.default.svc.cluster.local     443      -          outbound      EDS              
prometheus_stats                         -        -          -             STATIC           
sds-grpc                                 -        -          -             STATIC           
sleep.default.svc.cluster.local          80       -          outbound      EDS              
xds-grpc                                 -        -          -             STATIC           
zipkin                                   -        -          -             STRICT_DNS

You can see that there is no information about the services in the bookinfo application in the sidecar of the sleep service, and now the configuration in the sidecar looks much more streamlined. Congrats, congratulations, congratulations (^o^)/

Of course, we didn't just reduce the number of configurations in Sidecar. As mentioned above, after using "selective service discovery", the control plane will not watch any services except the default namespace, so the workload of the control plane has been greatly reduced, and the service mesh has returned to an efficient operation state~ We can also deploy other services to the cluster without any worries.

Summarize

The service discovery mechanism of service mesh is actually a very simple thing. After all, service mesh has no service discovery mechanism at all!

But without actually knowing it, few people can directly determine how the various services recorded in each Sidecar are discovered. At the same time, Istiod, as the core component responsible for configuration push in the service grid, does background work such as modifying translations and modifying configurations, which also makes it difficult for grid users to realize what the control plane components are doing, and Whether existing practices place an excessive and redundant burden on control plane components.

I hope this article will make more service grid users realize that the optimization of configuration push is also an important part of grid platform maintenance. If you use "selective service discovery", you can greatly optimize the configuration and optimization of the grid in just one minute, and reduce unnecessary hidden dangers when using the grid. Isn't it beautiful?

For most users, Selective Service Discovery is sufficient to optimize configuration push. However, if you want to do something more "absolute", you can directly use the sidecar resources of the service mesh to optimize the configuration of the sidecar to the greatest extent. For the mechanism of sidecar resources, service mesh ASM also provides sidecar resources that are automatically recommended based on access log analysis to help customers in-depth simplification of usage. Welcome to Service Mesh ASM, which gives you simplified, non-intrusive service governance in no time!

From April 1, 2022, Alibaba Cloud Service Mesh ASM will officially launch a commercial version, providing richer capabilities, larger scale support, and better technical support to better meet customers' different demand scenarios.


阿里云云原生
1k 声望302 粉丝