Microservice full-link grayscale new capabilities

Author: Shimian, Xun Mu

background

In the microservice architecture, the dependencies between services are intricate, and sometimes a function release depends on multiple services being upgraded and launched at the same time. We hope that the new versions of these services can be verified with small traffic at the same time. This is the unique full-link grayscale scene in the microservice architecture. By building an environment isolation from the gateway to the entire backend service, multiple different versions can be verified. service for grayscale verification. During the release process, we only need to deploy the grayscale version of the service. When the traffic flows on the call link, the grayscale traffic is identified by the gateways, middleware and microservices it flows through, and dynamically forwarded to the corresponding service. Grayscale version. As shown below:

title=

The above figure can well show the effect of this scheme. We use different colors to represent the gray-scale traffic of different versions. It can be seen that both the microservice gateway and the microservice itself need to identify the traffic and make dynamic decisions according to the governance rules. . When the service version changes, the forwarding of this call link will also change in real time. Compared with the grayscale environment built by machines, this solution can not only save a lot of machine cost and O&M manpower, but also help developers to perform refined full-link control of online traffic in real time and quickly.

The full-link grayscale capability enhances the advantages of rapid iteration and stability verification brought by the microservice architecture, and brings real benefits to the system in the enterprise production environment. This article will focus on the application scenarios and pain points of MSE service governance based on full-link grayscale capabilities, and new capabilities have been extended.

Full link runtime white screen capability

In the process of using full-link grayscale in our production environment, we often encounter some problems.

We configure whether the traffic flow of the full-link grayscale is in line with expectations, and whether our traffic is matched according to the grayscale rules we configured.
There are a lot of slow calls and exceptions in our grayscale traffic. How can I determine whether it is a business problem with our new version of the code or a system problem caused by our incomplete consideration in the traffic grayscale process? How to quickly locate the problem to achieve high efficiency iteration.
In the process of designing a grayscale system, we need to consider how to mark our grayscale traffic. Sometimes it may be difficult to find suitable traffic characteristics (parameters, headers, etc.) Semantic identification), how can we quickly mark our traffic in such a scenario.

Based on the above series of problems, it is also a problem that we continue to encounter in the process of supporting customers on the cloud to implement full-link grayscale. The white screen capability at runtime is also a capability we abstractly designed in this process.

The purpose of running the white screen is to help us gain insight into the full-link grayscale traffic matching and running behavior.

Our traffic routing-based rule abstracts the runtime white screen rule as follows:

WhiteScreenRule = Taget + Action **

Target:

ResourceTarget: target interface, supports Web, Rpc and custom methods

title=

WorkloadTarget: target instance, you can select all machines or specify machine IP

title=

TrafficCondition: whether only for exceptions, slow calls, and full-link grayscale labels

Action:

Collection of relevant contextual diagnostic information
Traffic coloring on subsequent links
Whether the log is printed on subsequent links

title=

Let's take a detailed look at how to use the runtime white screen capability to solve the problems we encountered in the full-link grayscale process.

The matching of gray-scale traffic and whether the flow direction is as expected

title=

For the above scenario, we only need to configure the white screen matching rule of the Zuul application entry:

title=

We can quickly observe the parameters, return values, headers and other characteristic attributes of the gray-scale traffic in the whole link. We can also quickly find out whether the full link meets expectations and why the positioning does not meet expectations.

title=

The configuration grayscale of the whole link

In addition to the grayscale of microservice instances and traffic, configuration items in microservice applications should also have corresponding grayscale capabilities to meet the grayscale application's demands for special configuration values.

Microservice applications usually introduce a configuration center for configuration management, which provides dynamic configuration push capabilities so that applications can dynamically change the running logic without restarting. However, the management dimension of the configuration center is only the configuration item itself, and cannot perceive the environment information of the service instance that comes to obtain the configuration, that is, it cannot distinguish whether the request for configuration is an instance of a formal environment or an instance of a grayscale environment. In this context, if a certain configuration needs to use different values in the formal environment and the grayscale environment, they must be used as different configuration items in the configuration center, we may need to write such code:

 ...
if (env == "gray") {
    cfg = getConfig("cfg-1");
} else if (env == "gray2") {
    cfg = getConfig("cfg-2");
} else {
    cfg = getConfig("cfg-base");
}
...

This scenario is very common in A/B testing. This type of code will be repeated many times as configuration items and grayscale environments are added. In addition, there are often multiple services in a grayscale environment, and each service needs to independently maintain a set of similar codes. The final solution is shown in the figure. The configuration values used by the same configuration item in different environments need to be actively distinguished in the user application.

title=

The reason is that the configuration center cannot perceive the environmental information of the service instance, so we must perform this task in the code instead of the configuration center, which leads to the intrusion of the environmental information into the business code.

In response to this problem, the configuration label push function of MSE sinks the perception of environmental information in the configuration management scenario to the platform side, and the agent is responsible. Users only need to access the MSE, and can easily use the configuration push capability in the full-link grayscale scene, eliminating the cumbersome environment information detection logic in the business code. as the picture shows:

title=

For specific operation steps, please refer to Microservice Governance Practical School: https://help.aliyun.com/practice_detail/447313

Step 1: Restore the online scene

We will deploy four business applications, spring-cloud-zuul, spring-cloud-a, spring-cloud-b, and spring-cloud-c, and the registration center Nacos Server. Its calling link is as follows:

title=

These applications are the simplest Spring Cloud and Dubbo applications, and you can get the project source code at the link below:

https://github.com/aliyun/alibabacloud-microservice-demo/tree/master/mse-simple-demo

Log in to the MSE Governance Center console, click Application Governance in the left navigation bar, enter cfg-spring-cloud-a, and click the search icon. Select the cfg-spring-cloud-a application card to enter the application details page.

Then select Application Configuration > Configuration List in the left navigation bar, click the + in front of the switch configValue, you can see that the configuration values of the baseline version and the gray version in the A application are the same, which are the initial values in the application.

title=

Step 2: Configure and apply the grayscale rules of spring-cloud-a

On the application details page of cfg-spring-cloud-a, select Traffic Management in the left navigation bar, and click Label Routing . As shown in the figure:

title=

Next we configure grayscale rules for grayscale instances. Click the tab gray > Traffic Rules > Add , configure the following grayscale rules, and click OK.

title=

Step 3: Verify the configuration grayscale

Next let's verify the configuration grayscale.

1. Perform label push on the grayscale instance

Go back to Application Configuration > Configuration List on the application details page of cfg-spring-cloud-a, and select Push by label after the switch configValue. Select the label gray in the pop-up window, and set the configuration value in the grayscale environment.

title=

Then click Next: Value Comparison , and then click Label Push to complete the push. At this point, you can see on the console that the configuration value of the grayscale instance has changed to the value we just set. **

title=

2. Verify that the configuration grayscale takes effect

Log in to the Container Service console, click Cluster in the left navigation bar, and enter the cluster where the application is deployed. On the cluster details page, choose Network > Services , find zuul-slb, and click its external endpoint to access the service invocation page.

Enter /A/a to access the baseline version of the A application, and you can see that its configuration values are still the initial values.

title=

Enter /A/a?name=xiaoming, you can access the grayscale version of the A application through the grayscale rule you just configured, and you can see that its configuration value has changed to the value we just pushed.

title=

3. Verify the persistence of grayscale configuration values

Configuration values pushed by tags are persistent. This means that even if the application in the grayscale environment is restarted, the previously pushed configuration values can be automatically obtained from the MSE Agent.

Log in to the Container Service console, click Cluster in the left navigation bar, and enter the cluster where the application is deployed. On the cluster details page, select Workload > Stateless . Check the spring-cloud-a-gray payload and click Batch Redeploy . You can also use the kubectl tool to perform load redeployment to simulate the process of restarting a grayscale application.

After the application restarts, re-execute the process of accessing the grayscale application in the previous step, and you can find that the configuration value is still the previously pushed configuration value.

Summarize

This paper introduces the capabilities of running white screen and configuring grayscale based on the full-link grayscale extension, which improves the full-link grayscale scene and further improves the ease of use of the full-link grayscale. Full-link grayscale is a more important scenario in microservice governance. The full-link grayscale capability of MSE is still expanding and iterating with the deepening of customer scenarios. We need to continue to invest in such an important scenario. It is more easy to use through deep and thorough. It is foreseeable that we still have a lot of ways to continue to explore the ability of full-link grayscale. At present, nearly 100 enterprises have used full-link grayscale. We have always believed that only after Products that customers continue to polish will become more durable. If you are also interested, welcome to use and experience.

MSE cloud native gateway prepaid, MSE registered configuration prepaid first purchase 20% off, first purchase 1 year and above 30% off. Click here to read and save!

Microservice full-link grayscale new capabilities

background

Full link runtime white screen capability

The matching of gray-scale traffic and whether the flow direction is as expected

The configuration grayscale of the whole link

Step 1: Restore the online scene

Step 2: Configure and apply the grayscale rules of spring-cloud-a

Step 3: Verify the configuration grayscale

Summarize

阿里云云原生

引用和评论

从 AI Agent 到模型推理：端到端 AI 可观测实践

🔥吐血整理 Bolt.diy 部署与应用攻略

支付宝H5下载被拦截的原因排查与解决指南

JManus - 面向 Java 开发者的开源通用智能体

MCP协议重大升级，Spring AI Alibaba联合Higress发布业界首个Streamable HTTP实现方案

PAI Model Gallery 支持云上一键部署 Qwen3 全尺寸模型

2025年3月中国数据库排行榜：PolarDB夺魁傲群雄，GoldenDB晋位入三强