1

Authors: Chang

* * Annoying CPU current limit affects the operation of the container. Sometimes people have to sacrifice container deployment density to avoid CPU current limit. The CPU Burst technology we designed can not only ensure the quality of container operation service, but also does not reduce the density of container deployment. The CPU Burst feature has been merged into Linux 5.14. Anolis OS 8.2, Alibaba Cloud Linux2, Alibaba Cloud Linux3 also support the CPU Burst feature.

In K8s container scheduling, the upper limit of the container's CPU resources is specified by the CPU limits parameter. Setting the upper limit of CPU resources can limit individual containers from consuming excessive CPU running time and ensure that other containers get enough CPU resources. The CPU limits are implemented in the Linux kernel with the CPU Bandwidth Controller, which limits the resource consumption of the cgroup through the CPU current limit. So when a process in a container uses resources that exceed the CPU limits, these processes will be limited by the CPU, the CPU time they use will be limited, and some key latency indicators in the process will become worse.

Faced with this situation, what should we do? Under normal circumstances, we will combine the daily peak CPU utilization of this container and multiply it by a relatively safe factor to set the CPU limits of this container, so that we can avoid the deterioration of service quality caused by the container due to current limit, but also Can take into account the utilization of CPU resources. For a simple example, we have a container whose daily peak CPU usage is around 250%. Then we set the container's CPU limits to 400% to ensure the quality of container service. At this time, the container's CPU utilization is 62.5%. (250%/400%).

But is life really that good? Obviously not! The occurrence of CPU current limit is much more frequent than expected. what to do? It seems that we can only continue to increase the CPU limits to solve this problem. In many cases, when the container's CPU limits are magnified by 5 to 10 times, the container's service quality is better guaranteed. Correspondingly, the total CPU utilization of the container is only 10% to 20% at this time. Therefore, in order to cope with possible container CPU usage peaks, the deployment density of containers must be greatly reduced.

Historically, people have fixed some CPU current limiting problems caused by BUG in the CPU Bandwidth Controller. We found that the current unexpected current limiting is caused by the sudden use of 100ms CPU, and proposed the CPU Burst technology to allow certain CPU sudden use to avoid The CPU current limit when the average CPU utilization is lower than the limit. In cloud computing scenarios, the value of CPU Burst technology is as follows:

  1. Improve the quality of CPU resource service without increasing the CPU configuration;
  2. Allow resource owners to reduce CPU resource allocation without sacrificing resource service quality and improve CPU resource utilization;
  3. Reduce resource cost (TCO, Total Cost of Ownership).

The CPU utilization you see is not the whole truth

The second-level CPU utilization cannot reflect the 100ms-level CPU utilization of the Bandwidth Controller, which is the cause of unexpected CPU current limiting.

Bandwidth Controller is suitable for CFS tasks, using period and quota to manage cgroup CPU time consumption. If the period of the cgroup is 100ms and the quota is 50ms, the process of the cgroup can use up to 50ms of CPU time per 100ms cycle. When the 100ms cycle of CPU usage exceeds 50ms, the process will be throttled, and the cgroup's CPU usage will be limited to 50%.

The CPU utilization rate is the average of the CPU usage over a period of time. The CPU usage demand is calculated at a coarser granularity, and the CPU utilization rate tends to be stable; when the observed granularity becomes finer, the burst characteristics of the CPU usage are more obvious. The container load is observed at the same time with 1s granularity and 100ms granularity. When the observation granularity is 1s, the second level of CPU utilization is about 250% on average, and the peak value of CPU utilization at the 100ms level of Bandwidth Controller work has exceeded 400%.

1.png

Set the container quota and period to 400ms and 100ms according to the 250% observed CPU utilization in seconds. The fine-grained burst of the container process is limited by the Bandwidth Controller, and the CPU usage of the container process is affected.

how to improve

We use CPU Burst technology to meet this fine-grained CPU burst demand, and introduce the concept of burst on the basis of the traditional CPU Bandwidth Controller quota and period. When the container's CPU usage is lower than the quota, burst resources that can be used for bursts accumulate; when the container's CPU usage exceeds the quota, the accumulated burst resources are allowed to be used. The final effect is to limit the average CPU consumption of the container for a longer period of time within the quota range, and allow the CPU usage in a short period of time to exceed its quota.

2.png

If the Bandwidth Controller algorithm is used to manage vacations, the period of vacation management is one year, and the quota of vacations in one year is quota. With the CPU Burst technology, the vacations that cannot be repaired this year can be taken later.

After using CPU Burst in the container scenario, the service quality of the test container is significantly improved. It was observed that the mean RT decreased by 68% (from 30+ms to 9.6ms); 99% RT decreased by 94.5% (from 500+ms to 27.37ms).

3.png

Guarantee of CPU Bandwidth Controller

Using the CPU Bandwidth Controller can prevent certain processes from consuming too much CPU time and ensure that all processes that require CPU get enough CPU time. The reason for such a good stability guarantee is that when the Bandwidth Controller settings meet the following conditions,

4.png

There are the following scheduling stability constraints:

5.png

in,

6.png

It is the quota of the i-th cgroup and the CPU demand of the cgroup in a period. Bandwidth Controller performs CPU time statistics for each cycle separately, and scheduling stability constraints ensure that all tasks submitted within a period can be processed in that cycle; for each CPU cgroup, this means tasks submitted at any time All can be executed within a period, that is, the task real-time constraints:

7.png

Regardless of the task priority, the worst-case task execution time (WCET, Worst-Case Execution Time) does not exceed one period.

If it persists

8.png

The stability of the scheduler is broken, tasks are accumulated in each period, and the execution time of newly submitted jobs continues to increase.

Impact of using CPU Burst

In order to improve the quality of service, we use CPU Burst to allow sudden CPU usage, what impact will it have on the stability of the scheduler? The answer is that when multiple cgroups use the CPU in bursts at the same time, the scheduler stability constraints and task real-time guarantees may be broken. At this time, the probability that the two constraints are guaranteed is the key. If the probability that the two constraints are guaranteed is very high, the real-time performance of the task is guaranteed for most cycles, and you can use CPU Burst with confidence; if the real-time performance of the task is guaranteed, the probability It is very low. At this time, CPU Burst cannot be used directly to improve the quality of service. You should first reduce the deployment density and increase the CPU resource allocation.

So the next concern is how to calculate the probability of two constraints being broken in a specific scenario.

Assess the size of the impact

Quantitative calculation can be defined as a classic queuing theory problem and solved by Monte Carlo simulation. The quantitative calculation results show that the main influencing factors for judging whether CPU Burst can be used in the current scenario are the average CPU utilization and the number of cgroups. The lower the CPU utilization, or the greater the number of cgroups, the less likely it is to break the two constraints. You can use CPU Burst with confidence. Conversely, if the CPU utilization is high or the number of cgroups is small, to eliminate the impact of CPU current limiting on process execution, you should reduce the deployment and increase the configuration before using CPU Burst.

The problem definition is: There are a total of m cgroups, and the quota of each cgroup is limited to 1/m. The computing demand (CPU utilization) generated by each cgroup in each cycle obeys a specific distribution, and these distributions are independent of each other. Assuming that the task arrives at the beginning of each cycle, if the CPU demand in this cycle exceeds 100%, the current cycle task WCET exceeds 1 period, and the excess part is accumulated and the new CPU demand generated in the next cycle is processed in the next demand. The input is the number m of cgroups and the specific distribution that each CPU needs to meet, and the output is the probability of WCET> period at the end of each cycle and the WCET expectation.

Take the input CPU demand for the Pareto distribution and the result of m=10/20/30 as an example. The reason for choosing the Pareto distribution for explanation is that it produces more long-tailed CPU burst usage, which is prone to have a greater impact. The format of the data items in the table is

9.png

in

10.png

The closer to 1, the better,

11.png

The lower the probability, the better.

The result is consistent with intuition. On the one hand, the higher the CPU demand (CPU utilization), the easier it is for CPU bursts to break the stability constraints, resulting in a longer task WCET expectation. On the other hand, the more cgroups whose CPU requirements are independently distributed, the lower the probability that they will generate sudden CPU demand at the same time, the easier it is to maintain the scheduler stability constraints, and the closer the WCET expectation is to 1 period.

Scene and parameter setting

We set that there are m cgroups in the entire system, and each cgroup fairly shares 100% of the total CPU resources, that is, quota=1/m. Each cgoup generates calculation requirements according to the same law (independent and identically distributed) and is handed over to the CPU for execution. \

12.png

We refer to the model of queuing theory and regard each cgroup as a customer, the CPU is the service desk, and the service time of each customer is limited by quota. In order to simplify the model, we discretely define the arrival time interval of all customers as a constant, and then the CPU can serve up to 100% of the computing demand within this interval. This time interval is a cycle.

Then we need to define the service time of each customer in a cycle. We assume that the computing demand generated by customers is independent and identically distributed, and the average value is u_avg times its own quota. The customer’s unsatisfied computing requirements in each cycle will always accumulate. The service time it submits to the service desk in each cycle depends on its own computing requirements and the maximum CPU time allowed by the system (that is, its quota plus the previous cycle cumulative Token).

Finally, there is an adjustable parameter buffer in the CPU Burst technology, which represents the upper limit of the token that is allowed to accumulate. It determines the instantaneous burst capability of each cgroup, and we express its size as b times the quota.

We have made the following settings for the above-defined parameters:

The negative exponential distribution is one of the most common and most used distributions in queuing theory models. Its density function is

13.png

in

14.png

Pareto distribution is a relatively common distribution in computer scheduling systems, and it can simulate a large delay long tail, thus reflecting the effect of CPU Burst. Its density function is:

15.png

In order to suppress the probability distribution of the tail so that it is not too exaggerated, we set:

16.png

At this time, when u_avg=30%, the maximum calculation requirement that may be generated is about 500%.

Data Display

The results of Monte Carlo simulation according to the above parameter settings are shown below. We reversed the y-axis of the first (WCET expectation) chart to better fit our intuition. Similarly, the second chart (probability that WCET is equal to 1) represents the probability that the real-time scheduling is guaranteed, expressed in percent.

Negative exponential distribution

17.png

18.png

Pareto distribution

19.png

20.png

Conclusion

Generally speaking, the higher u_avg (calculation demand load), the smaller m (the number of cgroups), and the larger the WCET. The former is an obvious conclusion, and the latter is because the more tasks in the case of independent and identical distribution, the more even the overall demand is generated. The tasks that exceed quota requirements and tasks that free up cpu time are more likely to complement each other.

Increasing the buffer will make CPU Burst play a better effect, and the optimization benefits of a single task will be more obvious; but it will also increase WCET, which means that it will increase the interference to adjacent tasks. This is also an intuitive conclusion.

When setting the buffer size, we recommend that you decide based on the calculation requirements (including distribution and average) of the specific business scenario, the number of containers, and your own needs. If you want to increase the throughput of the overall system and optimize the performance of the container when the average load is not high, you can increase the buffer; conversely, if you want to ensure the stability and fairness of the scheduling, reduce the container’s impact when the overall load is high. The influence of the buffer can be appropriately reduced.

Generally speaking, in scenarios where the average CPU utilization is lower than 70%, CPU Burst will not have a major impact on adjacent containers.

Simulation tools and methods of use

After talking about the boring data and conclusions, I will introduce the questions that may have many readers' concerns: Will CPU Burst affect my actual business scenarios? In order to solve this doubt, we have slightly modified the tools used in the Monte Carlo simulation method, so as to help you test the specific impact in your own actual scenarios~

The tool can be obtained here: https://codeup.openanolis.cn/codeup/yingyu/cpuburst-simulator

The detailed instructions for use are also attached to the README, let us look at a specific example below.

Little A wants to deploy 10 containers on his server for the same business. In order to obtain accurate measurement data, he first started a normal container operation business, bound to a cgroup named cg1, and did not set a current limit to obtain the true performance of the business.

Then call sample.py to collect data: (The demonstration effect only collected 1000 times. It is actually recommended that the larger the number of acquisitions is the better if conditions permit)

21.png

These data are stored in ./data/cg1_data.npy. The prompt output at the end shows that the business occupies about 6.5% of the CPU on average, and the total average CPU utilization is about 65% when 10 containers are deployed. (PS: The variance data is also printed out as a reference, maybe the larger the variance, the more you can benefit from CPU Burst)

Next, he uses simu_from_data.py to calculate the impact of setting the buffer to 200% when configuring 10 cgroups with the same scene as cg1:

22.png

According to the simulation results, enabling the CPU Burst function has almost no negative impact on the container in the business scenario, so A can use it with confidence.

If you want to learn more about the usage of this tool, or change the distribution to view the simulation results out of interest in theory, you can visit the warehouse link above to find the answer~

About the author

Chang Huaixin (Yizhai), an engineer in the Alibaba Cloud kernel team, is good at CPU scheduling.

Ding Tianchen (Yingyu), joined the Alibaba Cloud kernel team in 2021, and is currently studying and researching in the field of scheduling.

Click here to view the related introduction of Alibaba Cloud Proprietary Cloud Agile Cloud Native Stack!


阿里云云原生
1k 声望302 粉丝