Author: ten sleep
background
The stability of microservices has always been a topic of great concern to developers. With the evolution of business from a monolithic architecture to a distributed architecture and changes in deployment methods, the dependencies between services have become more and more complex, and business systems are also facing huge high-availability challenges. During the epidemic, you may have experienced the following scenarios:
- When making an online reservation to buy masks, the instantaneous peak traffic caused the system to exceed the maximum load, the load soared, and the user could not place an order;
- During online course selection, too many requests for course selection are submitted at the same time, and the system cannot respond;
- There are too many users in online conferences at the same time during online office/teaching, and the conferences are relatively stuck;
These scenarios of reduced availability will seriously affect the user experience, so we need to use some means to protect against unstable factors in advance, and we must also have the ability to quickly stop losses in the case of sudden traffic.
Flow control degradation - an important part of ensuring the stability of microservices
There are many factors that affect the availability of microservices, and these unstable scenarios can have serious consequences. From the perspective of microservice traffic, we can roughly divide it into two common scenarios:
- The service's own traffic exceeds the carrying capacity, resulting in unavailability. For example, the surge in traffic and batch task delivery cause the service load to soar, and the request cannot be processed normally.
Traffic is very random and unpredictable. The first second may be calm, and the next second there may be traffic peaks (such as the scene of Double Eleven at 0:00). However, the capacity of our system is always limited. If the sudden traffic exceeds the capacity of the system, it may cause requests not to be processed, accumulated requests to be processed slowly, CPU/Load soaring, and finally cause the system to crash. Therefore, we need to limit this kind of burst traffic to ensure that the service is not overwhelmed while processing requests as much as possible.
- The service is not available in its own chain due to its dependence on other unavailable services. For example, our service may depend on several third-party services. If a payment service is abnormal, the call is very slow, and the caller does not effectively prevent and deal with it, the thread pool of the caller will be full, affecting the service itself. run. In a distributed system, the invocation relationship is meshed and intricate, and the failure of a service may lead to cascading reactions, making the entire link unavailable.
A service often calls other modules, which may be another remote service, database, or third-party API. For example, when making a payment, it may be necessary to remotely call the API provided by UnionPay; to query the price of a certain commodity, a database query may be required. However, the stability of this dependent service is not guaranteed. If the dependent service is unstable and the response time of the request becomes longer, the response time of the method calling the service will also become longer, threads will accumulate, and eventually the thread pool of the business itself may be exhausted, and the service itself will also change. must not be available. Modern microservice architectures are distributed and consist of a very large number of services. Different services call each other to form a complex call chain. The above problems will have a magnifying effect in the link call. If a ring on a complex link is unstable, it may be cascaded, eventually making the entire link unavailable. Therefore, we need to fuse and downgrade unstable services, temporarily cut off unstable calls, and avoid local unstable factors causing the overall avalanche.
MSE service governance is based on the stability protection capabilities of Ali's current limiting and downgrading component Sentinel, and takes traffic as the entry point to help ensure the stability of services from multiple dimensions such as flow control, concurrency control, circuit breaker downgrade, hotspot protection, and system adaptive protection. , covering several scenarios such as microservices, cloud native gateways, and service meshes.
After introducing the scenarios and capabilities of flow control degradation, let's talk about the protagonist we will focus on today: the runtime dynamic Enhance capability. We will introduce how to implement flow control degradation at any point through one-click MSE service governance, including but not limited to Web, Rpc, SQL, Redis and other access interfaces, arbitrarily written business methods, framework interfaces, and so on.
Runtime Enhance capability - one-click flow control degradation at any point
How to add a flow control degradation capability to any specified method at runtime? Below I will briefly introduce a demo as an example. We wrote the following business code, we wrote a simple Spring Boot application, where a method is an internal method written at will.
@SpringBootApplication
public class AApplication {
public static void main(String[] args) {
SpringApplication.run(AApplication.class, args);
}
@Api(value = "/", tags = {"入口应用"})
@RestController
class AController {
...
@ApiOperation(value = "HTTP 全链路灰度入口", tags = {"入口应用"})
@GetMapping("/a")
public String restA(HttpServletRequest request) {
return a(request);
}
private String a(HttpServletRequest request) {
StringBuilder headerSb = new StringBuilder();
Enumeration<String> enumeration = request.getHeaderNames();
while (enumeration.hasMoreElements()) {
String headerName = enumeration.nextElement();
Enumeration<String> val = request.getHeaders(headerName);
while (val.hasMoreElements()) {
String headerVal = val.nextElement();
headerSb.append(headerName + ":" + headerVal + ",");
}
}
return "A"+SERVICE_TAG+"[" + inetUtils.findFirstNonLoopbackAddress().getHostAddress() + "]" + " -> " +
restTemplate.getForObject("http://sc-B/b", String.class);
}
...
}
}
So far, we can't see the a method in monitoring. We can only see the interface of restA or the monitoring data of GET:/a, and we can configure the current limiting and degrading rules for it.
In the open source method, we need to add the dependency of Sentinel to the code, and add Sentinel capabilities to the com.alibabacloud.mse.demo.AApplication.AController#a method configuration annotation or coding method
// 注解方式进行埋点,注解方式受 AOP 代理的诸多限制
@SentinelResource("com.alibabacloud.mse.demo.AApplication.AController:a")
private String a(HttpServletRequest request) {
StringBuilder headerSb = new StringBuilder();
Enumeration<String> enumeration = request.getHeaderNames();
while (enumeration.hasMoreElements()) {
String headerName = enumeration.nextElement();
Enumeration<String> val = request.getHeaders(headerName);
while (val.hasMoreElements()) {
String headerVal = val.nextElement();
headerSb.append(headerName + ":" + headerVal + ",");
}
}
return "A"+SERVICE_TAG+"[" + inetUtils.findFirstNonLoopbackAddress().getHostAddress() + "]" + " -> " +
restTemplate.getForObject("http://sc-B/b", String.class);
}
// SDK 方式增加流控降级能力,需要侵入业务代码
private String a(HttpServletRequest request) {
Entry entry = null;
try {
entry = SphU.entry("HelloWorld");
StringBuilder headerSb = new StringBuilder();
Enumeration<String> enumeration = request.getHeaderNames();
while (enumeration.hasMoreElements()) {
String headerName = enumeration.nextElement();
Enumeration<String> val = request.getHeaders(headerName);
while (val.hasMoreElements()) {
String headerVal = val.nextElement();
headerSb.append(headerName + ":" + headerVal + ",");
}
}
return "A"+SERVICE_TAG+"[" + inetUtils.findFirstNonLoopbackAddress().getHostAddress() + "]" + " -> " +
restTemplate.getForObject("http://sc-B/b", String.class);
} catch (BlockException ex) {
System.err.println("blocked!");
} finally {
if (entry != null) {
entry.exit();
}
}
}
If you need to code, there will naturally be many drawbacks. To increase dependencies, you need to change the code, and you need to re-publish.
So how can we achieve the current limit and downgrade capability of com.alibabacloud.mse.demo.AApplication.AController#a without writing a line of code?
Configure runtime white screen rules
Configure the runtime whitening rule, select the currently applied interface of the custom embedment type, and fill in the classes and methods.
Of course, we can see that our white screen rule capability not only supports dynamic current limiting and downgrading, but also supports the collection of access logs and request contexts at any point.
Observed monitoring data for the specified method
We find the target application in Application Governance, and see the monitoring data of the specified method com.alibabacloud.mse.demo.AApplication.AController#a in Interface Monitoring > Custom Embedding Points
Configure flow control rules
We can click the "Add Protection Rule" button in the upper right corner of the interface overview to add a flow control rule:
We can configure the simplest flow control rules in QPS mode. For example, in the above example, the number of single-machine calls per second for this interface is limited to no more than 1 time.
After configuring the rules, wait for a while to see the current limiting effect on the monitoring page:
Denied traffic also returns an error message. The framework embedded points that come with MSE all have default flow control processing logic, such as returning 429 Too Many Requests after the Web interface is limited, and throwing exceptions when the DAO layer and java methods are limited.
Summarize
We abstract the runtime white screen capability into the following rules: WhiteScreenRule = Taget + Action **
Target:
- ResourceTarget: target interface, supports Web, Rpc, SQL and any custom method
- WorkloadTarget: target instance, you can select all machines or specify machine IP
- TrafficCondition: whether only for exceptions, slow calls, and full-link grayscale labels
Action:
- Collection of relevant context diagnostic information, parameters, return values, thread contexts, Target objects, class loader information, etc.
- Whether the log is printed on subsequent links
- Perform current limit downgrade
- Specified flow for marking and dyeing (planning)
In the near future, MSE will launch a model based on the above rules combined with the log management of the dynamic Enhance capability. We not only have current limit and degradation at any point based on the dynamic Enhance capability, but also help us gain insight into the behavior of full-link traffic operation and make decisions. Real-time governance and protection.
MSE Sentinel not only has a wide range of applications in e-commerce fields such as Taobao and Tmall within Alibaba, but also has a large number of practices in Internet finance, online education, games, live broadcast industries and other large-scale government and state-owned enterprises. With the ability to limit and degrade current for any method, we can quickly give any microservice system the ability to protect against traffic, allowing us more time to focus on the rapid development of the business and the stability of the system Just leave it to MSE with confidence and let a professional team do professional things.
MSE cloud native gateway prepaid, MSE registered configuration prepaid first purchase 20% off, first purchase 1 year and above 30% off. Click here for more details~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。