Author: Su
The stability of microservices has always been a topic of great concern to developers. With the evolution of business from a monolithic architecture to a distributed architecture and changes in deployment methods, the dependencies between services have become more and more complex, and business systems are also facing huge high-availability challenges. Application High Availability Service AHAS (Application High Availability Service) is a cloud product precipitated by Alibaba's internal high availability system for many years. It is based on Alibaba's open source flow control downgrade component Sentinel, with traffic and fault tolerance as the entry point, from traffic control, Unstable call isolation, fuse downgrade, hotspot traffic protection, system adaptive overload protection, cluster flow control, service anti-jitter and other dimensions to help ensure the stability of services and gateways, and provide second-level traffic monitoring and analysis functions . AHAS is not only widely used in e-commerce fields such as Taobao and Tmall in Ali, but also has a lot of practice in Internet finance, online education, games, live broadcast industry and other large government and central enterprises.
In the distributed system architecture, each request will go through many layers of processing, such as from the ingress gateway to the Web Server to the invocation between services, and then to the service access cache or storage such as DB. In the high-availability traffic protection system, we recommend that targeted traffic protection and fault-tolerant measures be implemented at each layer of the traffic link to ensure service stability. In the last issue, we introduced how to access AHAS Sentinel at the Nginx/Ingress gateway layer for pre-traffic protection. In this article, we will introduce fine-grained high-availability traffic protection at the web application layer.
Web Server Scenario
AHAS application protection supports multi-language native access such as Java/Go, and supports mainstream web frameworks and components:
- Java supports mainstream web frameworks such as Spring Web/Spring WebFlux/Spring Boot/Spring Cloud/Tomcat/Jetty/Undertow
- Go supports Gin, echo and other frameworks
Each service has a maximum request capacity that can be carried, and this capacity is usually evaluated by stress testing. In the scenario of sudden surge in traffic, we need to configure flow control rules for the core Web API to ensure that the system traffic is within the effective processing capacity and avoid being overwhelmed. At the same time, web traffic usually has many business attributes and parameters, such as IP, user ID, commodity ID, etc. Many requests have certain hotspot characteristics, such as a large wave of requests for a hot commodity. Due to the uncertain and unpredictable characteristics of traffic, we cannot accurately predict the magnitude, distribution, and hotspot access of traffic. Ability is also very important.
AHAS Web scene flow control [1] not only supports flow control in the dimension of URL path, but also supports fine-grained hotspot flow control for client IP, host, header, request parameters, etc. We only need to specify which request attribute for hotspot flow control (such as the header whose key is UserId) when configuring the Web flow control rules, then AHAS will automatically analyze the value of the corresponding request attribute in the request (such as the value of the header), and automatically Count the parameters of top N hotspot access and control the request volume respectively. Through this dimensional control, we can implement a series of fine-grained high-availability protection strategies such as IP anti-brush and hot-spot commodity anti-brush on the web server, and even limit access to N times per minute for each user and each API. Traffic control policies with business implications.
Web Client Scenario
In addition to Web server entry flow control, AHAS Sentinel also provides adaptations for common Web Clients, such as OkHttp, Apache HttpClient and Spring RestTemplate. Based on the Web Client adaptation module, we can protect the timing when the HTTP client calls other interfaces to ensure that the calling end will not be dragged down by slow interfaces or abnormal interfaces:
- When the Web caller calls an interface very slowly and the request volume is large, if it is not controlled, the slow call will crowd the entire connection pool, causing the calls of other normal services to be blocked, and its own ability to process requests will also be delayed. collapse. At this time, the concurrency control rule [2] can be used to ensure the reliability of the calling end by controlling the number of concurrent threads of the API path (that is, the number of simultaneous calls) and prevent the connection pool from being full.
- When an interface called by the Web client continues to have slow calls or exceptions, the AHAS circuit breaker rule [3] can also be used for further protection. The fuse rule provides the ability to fuse. After a certain API path has slow calls or persistent exceptions reaching a certain percentage, the calls to the API will be automatically blown within a period of time, and the preset downgrade information will be returned directly. Through the circuit breaker mechanism, on the one hand, it can prevent the calling end from being dragged down by the unstable third-party interface, and on the other hand, it also gives the unstable interface some recovery time to avoid further deterioration of the service caused by continuous calls.
- When non-fatal errors such as occasional timeout occur when the Web Client calls the interface, the automatic retry rule [4] can be used to ensure the business success rate and avoid business jitter.
Circuit breaker rules and concurrency control rules are effective means to ensure the stability of the calling end, and are very applicable in both Web client scenarios and slow SQL scenarios.
Play AHAS Web Scenario Protection Quickly
In the first step, we connect the web service to AHAS traffic protection. AHAS provides a variety of fast and convenient means of non-intrusive access:
Taking the common Spring Boot web service as an example, after successful access to AHAS, as long as the service call/interface access is triggered, you can see your own service on the AHAS console [5], and you can see your own web interface on the web scene page . The default Spring Boot application access will extract the URL path in the controller as the resource name:
We can observe the real-time traffic and response time of each interface very intuitively on the monitoring page, so that we can evaluate the stability of the system.
In the second step, we configure Web flow control rules for one of the interfaces [6]. In our example, we configure the hotspot parameter flow control for the /hello API. The request attribute of flow control is the request parameter (URL params) of name, and the flow control strategy limits the number of visits per second for each hotspot parameter at most once. AHAS will automatically extract the value corresponding to the name parameter in each /hello request, automatically analyze the hotspot value of the top K frequency, and then control each hotspot value separately, not exceeding the number of visits in the rule.
Example of effect: Suppose there are many accesses to the /hello interface in the request, where /hello?name=A and
The access frequency of the two name parameters /hello?name=B is very high, and is counted as a hot value by AHAS statistics. Then the above flow control rules will limit the number of visits per second to no more than 1 time per second for requests with the two parameter values of name=A and name=B.
In the third step, we bind a fallback behavior [7] to the above Web flow control rule, that is, specify the return content of the Web interface after the rule is triggered. We can customize the web return behavior in the page. The following is an example JSON return configuration that returns a 429 status code:
After the configuration is complete, we can select the fallback behavior we created on the process page and save it, so that our rules will take effect in real time.
In the fourth step, we initiate traffic for the /hello interface and give different values to the name parameter. From the monitoring of the console, we can see that the request triggers flow control:
At the same time, the return value of the flow-controlled hotspot request also corresponds to the fallback return we defined in the console:
In a production environment, we may have many values for these parameters, for example, there may be tens of millions of product IDs. After the hotspot request is flow-controlled, we usually want to know which top parameters are restricted, so that we can understand the business request situation. AHAS recently launched the hotspot monitoring function, which provides the ability to observe hotspot parameters. Combined with the top parameter heatmap, you can intuitively understand the real-time business situation on the console:
The above is an example of a simple Spring Boot web scenario flow control. You are welcome to experience it in the AHAS console. For more information about web scenario protection, you can click to read the original text and view the reference documentation.
Related Links
[1] AHAS Web scene flow control:
https://help.aliyun.com/document_detail/337922.html
[2] Concurrency control rules
https://help.aliyun.com/document_detail/146242.html
[3] Circuit breaker rules
https://help.aliyun.com/document_detail/101078.html
[4] Automatic retry rules
https://help.aliyun.com/document_detail/194976.html
[5] AHAS console
https://common-buy.aliyun.com/?commodityCode=ahas_001#/buy
[6] Web flow control rules
https://help.aliyun.com/document_detail/337922.html
[7] fallback behavior
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。