Abstract: ROMA Connect, the core system of the ROMA platform, is derived from Huawei's process IT integration platform, and has more than 15 years of enterprise business integration experience within Huawei.
This article is shared from Huawei Cloud Community " ROMA Integration Key Technology (1) API Flow Control Technology 161026710b8683", author: middleware brother.
1 Overview
ROMA Connect, the core system of the ROMA platform, is derived from Huawei's process IT integration platform, and has more than 15 years of enterprise business integration experience within Huawei. Relying on ROMA Connect, basic platforms such as the Internet of Things, big data, video, unified communications, and GIS, as well as the services, messages, and data of each application, can be unified, integrated, adapted and orchestrated, shielding the interface differences of each platform to the upper-layer business, and matching Provide services, messages, and data integration enabling services to support the rapid development and deployment of new services and improve application development efficiency. Suitable for scenarios such as safe parks, smart cities, and enterprise digital transformation. Figure 1 shows the functional view of ROMA Connect.
Figure 1 ROMA Connect function view
Among them, APIC (APIC Connect) as a core component contains API Gateway capabilities, which carry API integration and open capabilities. Flow control, as a key feature of API Gateway, provides fast and effective security protection for users’ API integration and opening. The implementation of API Gateway flow control will be described in detail, and the technical details of high-performance second-level flow control will be revealed.
2 Flow control requirements of high-concurrency and high-throughput systems
2.1 The motivation of flow control
In a high-concurrency and high-throughput system, the usual technical keywords are downgrading, caching, flow control, etc. Flow control is the core technology among them. Of course, these technologies are complementary to each other.
- The value of flow control
- Improve system stability/prevent avalanches
- Guarantee high-priority services
- Reduce response delay and improve user experience
- Improve system effective throughput
- Restricted business use, etc.
- …
- Target parameters of flow control
- Limit the total number of concurrent (such as database connection pool, thread pool)
- Limit the number of instantaneous concurrency (such as the limit_conn module of nginx)
- Limit the average rate within the time window
- Limit remote interface call rate
- Limit the consumption rate of MQ
- Limit network traffic
- Limit CPU and memory usage
- …
2.2 Business challenges
In large business scenarios, the main challenges are high concurrency, low latency, high precision, and flexible expansion in multiple dimensions.
Figure 2 Business challenges
The specific challenges for flow control are as follows:
- 10 times a day and 100,000 times per minute flow control coexist
- The flow control feedback cycle is longer than the flow control cycle
- There are many dimensions of flow control
- Flow control synchronization processing time affects user experience
- Flow control static setting, either too high or too low
- Flow control failure causes business failure
- Flow control node deployment complex resource consumption high
- …
3 Analysis of common flow control technologies
3.1 Common flow control logic architecture
Figure 3 Common flow control logic architecture
The advantages and disadvantages of various solutions are shown in the following table:
3.2 Common flow control algorithms
3.2.1 Counter Algorithm
Advantages: 1. The algorithm is simple and easy to implement.
Insufficiency: 1. The output is not smooth. 2. There are critical issues, and the output surge is prone to occur at the boundary of the flow control cycle, which greatly exceeds the flow control threshold and destroys the back-end service.
3.2.2 Sliding window algorithm
Advantages: 1. It can solve the critical problem of counter algorithm. 2. The algorithm is simple and easy to implement.
Disadvantages: 1. The higher the accuracy requirement, the more window grids are needed, and the memory overhead is larger. 2. The output is not guaranteed to be smooth.
3.2.3 Leaky Bucket Algorithm
Advantages: 1. The output speed has nothing to do with the input speed, it is constant, and the flow control effect is smooth. 2. No critical issues. 3. Do not rely on tokens.
Disadvantages: 1. Due to the constant output speed of the leaky bucket, a certain degree of burst request is not supported. 2. If the bucket is full, the input data will be discarded
3.2.4 Token Bucket Algorithm
Advantages: 1. Allow a certain degree of burst traffic. 2. By customizing the token addition method, complex flow control strategies can be customized. 3. No critical issues.
Insufficiency: 1. When there is no available token in the bucket, the input request will be directly discarded. 2. Processing input requests according to priority is not supported.
4 Implementation of ROMA Connect flow control technology
4.1 Overall strategy
high precision and high throughput, distinguishes flow control in different scenarios, and uses different strategies and algorithms
- Persistence of high-precision low-throughput flow control; high-throughput high-frequency pure memory counting strategy
- High-throughput and high-frequency flow control, no HA guarantee, data is cleared and recalculated after failure
Multi-dimensional and multi-priority, using Policy multi-dimensional control, a single request can trigger multiple policies
- Decoupling complex control, using multiple simple strategies to map separately; reducing user complexity
- A single request can trigger all policies that meet the conditions and perform comprehensive flow control
Reduce request delay and reduce the workload of Controller through distribution strategy, asynchronous, batch declaration and other mechanisms.
- Try to process at the Filter/SDK level as much as possible to avoid flow control requests from affecting business delays
- Report to the Controller as little as possible, reduce the load of the Controller and improve the efficiency of the Controller
- Filter and algorithm threshold are degraded and released to avoid the impact of
adopts KEY/VALUE mode and multi-dimension, and provides a general mechanism to adapt to the flow control requirements of different scenarios and different applications
- Based on the first application scenario of API Gateway
- The Controller does not need to understand the specific business, and the SDK-encapsulated Filter adapts the specific business and the flow control Controller
4.2 Logical View
- The RateLimit SDK accesses the RateLimit Controller after sharding according to the consistent hash, and the high-throughput and high-precision flow control is concentrated in the Controller memory for current limiting calculation.
- RateLimit Controller only concentrates on local memory calculations for high-precision and high-throughput, and does not need to consider retaining historical current limit information after crashes.
- The RateLimit Controller adopts an asynchronous persistence strategy for high-precision and low-throughput current limiting to ensure the accuracy of flow control after a Controller crash.
- When the Ratelimit Controller service is terminated, the Ratelimit SDK supports automatic downgrade.
- According to information feedback such as API Response latency collected by API Gateway, dynamic adjustment of flow control strategy is supported.
- Support SLA-Based Flow Control Policies.
4.3 Architecture design
uses an independent Controller solution
- Independent cluster Controller provides global accurate high-throughput flow control
- The Sharding mechanism is used inside the Controller
adopts the common Policy and Key/Value model
- Adopt an extensible Domain/Policy mechanism to adapt to different business scenarios
- Different policies are associated with different algorithms
Provide SDK and Tools, develop API G and other plug-ins
- Provide reusable SDK and debugging tools, etc.
- Pre-implemented API Gateway and other flow control plug-ins
external log, flow control data analysis module
- Feed back to the configuration/strategy management module through data mining, forecasting, etc., and dynamically revise the flow control strategy
- Feed back to the configuration/strategy management module through data mining, forecasting, etc., and dynamically revise the flow control strategy
4.4 Built-in algorithm
4.4.1 Token Bucket Algorithm with Cache and Color
Token bucket algorithm problem:
- When no token is available, the request will be rejected immediately. The user may continue to send requests until a token is available. This will increase the load of the API Gateway and the load of the flow control service.
- All requests get tokens with the same probability, and priority is not supported. In practical applications, some requests need to be processed first, while other requests can be delayed or rejected directly. For example, payment requests from e-commerce websites should be processed first, while requests to browse merchandise can be delayed.
designed a token bucket algorithm that supports caching and priority
Cache:
- When there are no available tokens, the request is temporarily placed in the request queue, and then processed when there are available tokens.
- The FCFS algorithm is used to process the request.
- If there is no free space in the cache, the request is rejected directly.
Token
- The token is divided into multiple colors, and different colors represent different priorities, such as green, yellow, and red, which indicate the priority is from high to low.
- In the API configuration file, you can configure the priority of different APIs. According to the pre-configured priority, a token of the corresponding color is allocated to the request. If the request has no priority, the default priority is used.
- Configure the number of tokens according to the capabilities of the API Gateway system.
- When a low-priority request arrives, if the amount of high-priority tokens is greater than the reserved amount, a high-priority token can also be allocated to the low-priority request. Set a reserve for tokens to ensure that low-priority requests will not run out of high-priority tokens.
- There is a separate request cache for each color token.
4.4.2 High-precision and high-throughput flow control algorithm
problem: the contradiction of high precision and high throughput
- In order to achieve high-precision flow control, API Gateway needs to send a flow control request to the flow control service for each API request, which will greatly reduce the throughput of processing requests.
- In order to improve throughput, API Gateway needs to reduce the frequency of sending flow control requests, which will reduce the accuracy of flow control. The lower the frequency of sending flow control requests, the lower the accuracy of flow control.
a high-precision and high-throughput flow control algorithm HAT (High Accuracy, High Throughput)
- The flow control is divided into an autonomous flow control stage and a flow control service flow control stage.
- Suppose the flow control threshold is L, the autonomous flow control threshold is S, the number of API Gateway cluster nodes is N, and the number of APIs processed in the current flow control cycle is R.
- Flow control service calculation: autonomous flow control threshold S = L/N, and distributed to each API Gateway node.
- Within the autonomous flow control threshold, each API Gateway node can perform autonomous flow control without sending flow control requests to the flow control service.
- When the API request volume of a node in the API Gateway cluster exceeds the autonomous flow control threshold-α, the node sends a flow control request to the flow control service to apply for a new flow control threshold. At this time, the flow control service contacts other nodes of the API Gateway to obtain the amount of API requests they handle. Then, the flow control service recalculates the autonomous flow control threshold S = (L-R)/N and sends it to each API Gateway node.
- When the flow balance <δ, the autonomous flow control threshold is no longer updated.
- When entering the next-level control cycle, the flow control service resets S, and each API Gateway node contacts the flow control service to update the autonomous flow control threshold.
algorithm analysis
- Let u be the number of times the autonomous flow control threshold is updated in a single flow control cycle, and Pi represents the speed at which the i-th API Gateway node processes the API.
- The number of flow control requests in a single flow control cycle is reduced from L to u*N.
- The optimal situation is that the performance of each node in the API Gateway cluster is exactly the same. At this time, u = 1. When the flow control threshold is 10000 and the number of API Gateway nodes is 10, the flow control request in a single flow control cycle drops from 10000 to 10.
- The closer the performance of each node of the API Gateway cluster is, the closer u is to 1. The greater the performance gap of each node of the API Gateway cluster, the greater u.
4.4.3 Dynamic flow control algorithm
Dynamic flow control based on running status, trend, and API call chain.
- After requesting a token, the flow control service starts to process the request and generates a flow control response (accept/reject, downgrade, or black and white list).
Dynamic flow control strategy based on running status
- According to the use of network status (available connections, network delay), request processing delay, API Gateway CPU, memory and other operating status, dynamically modify the flow control threshold. You can also wait.
- When the usage rate of cpu, memory, etc. is far less than the threshold, the request is processed normally.
- When the usage rate of cpu, memory, etc. approaches the threshold, lower the flow control threshold (degrade) to reduce the load of the API Gateway.
- When the usage rate of cpu, memory, etc. exceeds the threshold a lot, increase the speed of lowering the flow control threshold.
- When there is no CPU or memory available, the request is directly rejected.
- When the utilization rate of cpu, memory, etc. drops to a normal level, the flow control threshold is restored.
Dynamic flow control strategy based on running status trend
- Use machine learning to analyze historical data, generate predictive models, predict the load of API Gateway, modify flow control thresholds or downgrade services in advance to ensure smooth and stable API Gateway load.
- Use machine learning to discover requests that should be added to the blacklist.
Dynamic flow control strategy based on API call flow
- Case: API call flow.
Design a dynamic flow control strategy based on API call flow.
- Use machine learning to discover API call flow. The flow control service saves the API call relationship.
- When the system load is high, when an API request reaches the threshold and the flow is limited, all related API requests of the same level and low level will no longer access Redis to obtain real-time data and processing, but directly delay processing or reject it.
- When the API Gateway system load is normal, the dynamic flow control strategy is not activated.
- In this way, the load of the API Gateway and the load of the flow control service can be reduced without basically affecting the processing speed of the API.
Click to follow and learn about Huawei Cloud's fresh technology for the first time~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。