Author|Zeng Yuxing (
review & proofreading: Zeng Yuxing (
Editing & Typesetting: Wen Yan
background
Under the microservice software architecture, building a complete set of test systems for verification of new business functions is a time-consuming and time-consuming task. As the number of microservices split continues to increase, it becomes more difficult. The machine cost required for this complete test system is often not low. In order to ensure the efficiency of functional correctness verification before the application of the new version goes online, this system must be maintained separately. When the business becomes large and complex, it is often necessary to prepare multiple sets. This is a common and difficult cost and efficiency challenge faced by the entire industry. If the functional verification before the launch of the new version can be completed in the same production system, the manpower and financial resources saved are considerable.
In addition to the functional verification in the development phase, the introduction of grayscale release in the production environment can better control the risk and explosion radius of the new version of the software going online. Gray release is to allocate a certain characteristic or proportion of production traffic to the service version that needs to be verified to observe whether the running status of the new version is in line with expectations.
Alibaba Cloud ASM Pro (see the end of the article for related links) is based on the full-link grayscale solution built by Service Mesh, which can help solve the problems in the above two scenarios.
ASM Pro product functional architecture diagram:
The core capabilities use the expanded traffic marking and routing according to the above figure and the traffic Fallback capabilities, which are described in detail below.
Scene description
Common scenarios for full-link grayscale release are as follows:
Taking Bookinfo as an example, the ingress traffic will carry the expected tag group. The sidecar will route the traffic to the corresponding tag group by obtaining the expected tag in the request context (Header or Context). If the corresponding tag group does not exist, it will fallback route by default. To the base group, the specific fallback strategy can be configured. Next, the specific implementation details are described in detail.
The tag label of the ingress traffic is generally based on a similar tag plug-in method at the gateway level to mark the request traffic. For example, tag the userid in a certain range with a gray-scale tag. Considering the selection and implementation diversity of gateways in the actual environment, the implementation of the gateway is not within the scope of this article.
Below we focus on how to achieve full-link traffic marking and achieve full-link grayscale based on ASM Pro.
Realization principle
Inbound refers to the inbound traffic that requests are sent to the App, and Outbond refers to the outbound traffic that the App initiates outward requests.
The above figure is a typical traffic path of a business application after mesh is turned on: the business application receives an external request p1, and then calls the interface of another service that it depends on. At this time, the requested traffic path is p1->p2->p3->p4, where p2 is the forwarding of Sidecar to p1, and p4 is the forwarding of Sidecar to p3. In order to achieve full-link grayscale, both p3 and p4 need to obtain the traffic label from p1 to route the request to the back-end service instance corresponding to the label, and p3 and p4 should also carry the same label. The key is how to make the label transmission completely insensitive to the application, so as to realize the label transparent transmission of the whole link, which is the key technology of the gray scale of the whole link. The implementation of ASM Pro is based on the traceId in the distributed link tracing technology (for example, OpenTracing, OpenTelemetry, etc.) to achieve this function.
In distributed link tracking technology, traceId is used to uniquely identify a complete call chain. Fanout calls issued by each application on the link will be sourced through the distributed link tracking SDK. Bring your traceId. The realization of the ASM Pro full-link grayscale solution is based on the practice widely adopted in this distributed application architecture.
In the above figure, the inbound and outbound traffic originally seen by Sidecar are completely independent, and it cannot perceive the correspondence between the two, and it is also unclear whether an inbound request caused multiple outbound requests. In other words, Sidecar does not know whether there is a correspondence between the two requests p1 and p3 in the figure.
In the ASM Pro full-link gray-scale solution, the two requests p1 and p3 are associated with traceId, which specifically relies on the trace header of x-request-id in Sidecar. Sidecar maintains a mapping table internally, which records the correspondence between traceId and label. When Sidecar receives the p1 request, it stores the traceId and tag in the request into this table. When a p3 request is received, the label corresponding to the traceId is obtained from the mapping table and added to the p4 request, so as to realize the marking of the entire link and routing according to the label. The following figure roughly illustrates this realization principle.
In other words, the full-link grayscale function of ASM Pro requires the use of distributed link tracking technology. If the application that wants to use this technology does not use distributed link tracking technology, it will inevitably involve a certain transformation work. For Java applications, it is still possible to consider using Java Agent in the AOP way to allow the business to realize the transparent transmission of traceId between inbound and outbound without modification.
Achieve flow marking
The new TrafficLabel CRD is introduced in ASM Pro to define where the traffic label that Sidecar needs to pass through is obtained. In the YAML file exemplified below, the source of the traffic label is defined and the label needs to be stored in OpenTracing (specifically, the x-trace header). The traffic label is named trafficLabel, and the value is obtained from $getContext(x-request-id) and finally from $(localLabel) in the local environment.
apiVersion: istio.alibabacloud.com/v1beta1
kind: TrafficLabel
metadata:
name: default
spec:
rules:
- labels:
- name: trafficLabel
valueFrom:
- $getContext(x-request-id) //若使用aliyun arms,对应为x-b3-traceid
- $(localLabel)
attachTo:
- opentracing
# 表示生效的协议,空为都不生效,*为都生效
protocols: "*"
The CR definition includes two parts, namely the acquisition and storage of tags.
- Obtaining logic: First obtain the traffic label according to the field defined in the protocol context or header (Header part). If not, it will be obtained from the map recorded locally by Sidecar according to the traceId. The map table stores the mapping of the traceId corresponding to the traffic label. If the corresponding mapping is found in the map table, the traffic will be marked with the corresponding traffic label. If it is not available, the traffic label will be set to the localLabel of the corresponding environment for the local deployment. localLabel corresponds to the associated label of the local deployment, and the label name is ASM_TRAFFIC_TAG.
The tag name of the corresponding environment for local deployment is "ASM_TRAFFIC_TAG", and the actual deployment can be associated with the CI/CD system.
- Storage logic: attachTo specifies the corresponding field stored in the protocol context, for example, HTTP corresponds to the Header field, and Dubbo corresponds to the rpc context part. The specific storage in which field is configurable.
With the definition of TrafficLabel, we know how to mark traffic and deliver labels, but this alone is not enough to achieve full-link grayscale. We also need a function that can do routing based on trafficLabel traffic identification, that is, " "Route by standard" and logic such as route fallback, so that when the destination of the route does not exist, the function of degradation can be achieved.
Route by traffic label
The realization of this function extends Istio's VirtualService and DestinationRule.
Define Subset in DestinationRule
The custom group subset corresponds to the value of trafficLabel
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: myapp
spec:
host: myapp/*
subsets:
- name: myproject # 项目环境
labels:
env: abc
- name: isolation # 隔离环境
labels:
env: xxx # 机器分组
- name: testing-trunk # 主干环境
labels:
env: yyy
- name: testing # 日常环境
labels:
env: zzz
---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: myapp
spec:
hosts:
- myapp/*
ports:
- number: 12200
name: http
protocol: HTTP
endpoints:
- address: 0.0.0.0
labels:
env: abc
- address: 1.1.1.1
labels:
env: xxx
- address: 2.2.2.2
labels:
env: zzz
- address: 3.3.3.3
labels:
env: yyy
Subset supports two specified forms:
- Labels are used to match the node (endpoint) with a specific mark in the application;
- ServiceEntry is used to specify the IP address belonging to a specific subset. Note that this method is different from the label designation logic. They may not be the addresses obtained from the registration center (K8s or other), but are directly specified by configuration. Applicable to the Mock environment, the nodes in this environment are not registered with the service registry.
Based on subset in VirtualService
1) Global default configuration
- The route part can specify multiple destinations in sequence, and the traffic is distributed among multiple destinations in proportion to the weight value.
- The fallback strategy can be specified under each destination. The case identifies the conditions under which the fallback is executed, and the values are: noinstances (no service resources), noavailabled (service resources but the service is not available), target specifies the target environment of the fallback. If fallback is not specified, it is forced to execute in the environment of the destination.
- According to the standard routing logic, we modified VirtualService to make subset support the placeholder $trafficLabel. The placeholder $trafficLabel indicates that the target environment is obtained from the request traffic label, which corresponds to the definition in TrafficLabel CR.
The global default mode corresponds to the swimlane, that is, a single environment is closed, and the environment-level fallback strategy is specified. The custom group subset corresponds to the value of trafficLabel
The configuration example is as follows:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: default-route
spec:
hosts: # 对所有应用生效
- */*
http:
- name: default-route
route:
- destination:
subset: $trafficLabel
weight: 100
fallback:
case: noinstances
target: testing-trunk
- destination:
host: */*
subset: testing-trunk # 主干环境
weight: 0
fallback:
case: noavailabled
target: testing
- destination:
subset: testing # 日常环境
weight: 0
fallback:
case: noavailabled
target: mock
- destination:
host: */*
subset: mock # Mock中心
weight: 0
2) Personal development environment customization
- First hit the daily environment, and then hit the main environment when there are no service resources in the daily environment.
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: projectx-route
spec:
hosts: # 只对myapp生效
- myapp/*
http:
- name: dev-x-route
match:
trafficLabel:
- exact: dev-x # dev环境: x
route:
- destination:
host: myapp/*
subset: testing # 日常环境
weight: 100
fallback:
case: noinstances
target: testing-trunk
- destination:
host: myapp/*
subset: testing-trunk # 主干环境
weight: 0
3) Support weight configuration
The backbone environment will be marked and the local environment is dev-x traffic, 80% will go to the backbone environment, and 20% will go to the daily environment. When the backbone environment has no available service resources, the traffic goes to the daily routine.
sourceLabels is the label corresponding to the local workload
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: dev-x-route
spec:
hosts: # 对哪些应用生效(不支持多应用配置)
- myapp/*
http:
- name: dev-x-route
match:
trafficLabel:
- exact: testing-trunk # 主干环境标
sourceLabels:
- exact: dev-x # 流量来自某个项目环境
route:
- destination:
host: myapp/*
subset: testing-trunk # 80%流量打向主干环境
weight: 80
fallback:
case: noavailabled
target: testing
- destination:
host: myapp/*
subset: testing # 20%流量打向日常环境
weight: 20
Route by (environment) standard
This solution relies on the service deployment application with the relevant identification (the corresponding label in the example is ASM_TRAFFIC_TAG: xxx), which is usually the environment identification. The identification can be understood as the meta-information related to the service deployment. This depends on the upstream deployment system CI/CD system. The schematic diagram is as follows:
- In the K8s scenario, it is sufficient to automatically bring the corresponding environment/group label identification during service deployment, that is, use K8s itself as the metadata management center.
- Non-K8s scenarios can be integrated through the integrated service registry of microservices or metadata server (metadata server).
Note: ASM Pro self-researched and developed the ServiceDiretory component (refer to the ASM Pro product functional architecture diagram), which realizes the multi-registry docking and the dynamic acquisition of deployment meta-information;
Application scenario extension
The following is a typical multi-set development environment management function based on traffic marking and routing according to the standard; the Dev X environment corresponding to each developer only needs to deploy the service with the version update; if necessary, joint debugging with other developers , You can configure the fallback to transfer the service request fallback to the corresponding development environment. As shown in the figure below, the B of the Dev Y environment -> the C of the Dev X environment.
In the same way, it is also possible to equate the Dev X environment with the online grayscale version environment, which can solve the problem of full-link grayscale publishing in the online environment.
Summarize
The “traffic marking” and “standard routing” capabilities introduced in this article are a general solution. Based on this, it can better solve related problems such as test environment management, online full-link grayscale release, etc., based on service grid technology. It has nothing to do with the development language. At the same time, the solution is adapted to different 7-layer protocols, and currently supports HTTP/gRpc and Dubbo protocols.
Corresponding to the full-link grayscale, other manufacturers also have some solutions. Compared with other solutions, the advantages of ASM Pro's solutions are:
- Supports multiple languages and multiple protocols.
- Unified configuration template TrafficLabel, configuration is simple and flexible, and supports multi-level configuration (global, namespace, pod level).
- Support routing fallback to achieve downgrade.
Based on the "Traffic Marking" and "Route by Standard" capabilities, it can also be used in other related scenarios:
- Performance pressure test before the big promotion. In the online stress test scenario, in order to isolate the stress test data from the formal online data, a common method is to use shadows for message queues, caches, and databases. This requires the technology of traffic marking, which distinguishes whether the request is test traffic or production traffic by tag. Of course, this requires Sidecar to support middleware such as Redis and RocketMQ.
- Unitized routing. A common unitized routing scenario may require certain meta-information such as uid in the request traffic to obtain the corresponding unit through configuration. In this scenario, we can mark the traffic with a "unit label" by extending the TrafficLabel definition to obtain the "unit label" function, and then route the traffic to the corresponding service unit based on the "unit label".
Related Links:
1) Alibaba Cloud ASM Pro:
https://servicemesh.console.aliyun.com/
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。