Smart Request recommendation, K8s resource utilization increased by 252%

author

Wang Xiaowei, a FinOps certified practitioner and product manager of Tencent Cloud Container Service, is keen to provide customers with efficient ways to use Kubernetes and provide customers with extreme cost reduction and efficiency services.

Yu Yufei, a FinOps certified practitioner, and an expert engineer of Tencent Cloud, is engaged in the development of cloud-native observability, resource management, and cost reduction and efficiency enhancement products.

Why are resource utilization rates so low?

Although Kubernetes can effectively improve business orchestration capabilities and resource utilization, if there is no additional capability support, the ability to improve is very limited. According to the previous statistics of the TKE team: Kubernetes Cost Reduction and Efficiency Standard Guide | Containerized Computing Resource Utilization Analysis of the rate phenomenon , as shown in the following figure: The average resource utilization of the TKE node is about 14%.

Why is the resource utilization of Kubernetes clusters still not high?

A very important reason here is because of the resource scheduling logic of Kubernetes. When creating a Kubernetes workload, it is usually necessary to configure the appropriate resource Request and Limit for the workload, indicating the occupation and limitation of resources, of which the most important impact on utilization It is Request.

In order to prevent the resources used by their workloads from being occupied by other workloads, or to cope with the demand for resource consumption during peak traffic, users are generally accustomed to setting the Request to be larger, so that the difference between the Request and actual use , It causes waste, and this difference in resources cannot be used by other workloads.

The unreasonable value of Request is too large, which is a very common phenomenon that causes low resource utilization of Kubernetes clusters. In addition, the resources of each node are difficult to be fully allocated. As shown in the figure below, there are generally some resource fragments (Leftover) on the node, which are the reasons for the low utilization of the cluster's resource utilization.

How low is the actual utilization of resources?

How to set up a more reasonable resource Request, first of all need to analyze the consumption of resources by the business. In Tencent Cloud Native Kubernetes Cost Reduction and Efficiency Standard Guide | Resource Utilization Improvement Tool Common Resource Waste Scenarios section, a single workload is analyzed, and at least half of the resources in the workload setting Request are not used. Moreover, this part of resources cannot be used by other workloads, which is a serious waste. At this time, the perspective is raised to the cluster dimension. The following figure shows the CPU allocation rate and utilization rate of a certain TKE cluster.

The allocation rate is the sum of the requests of all containers to the CPU divided by the number of CPUs of all nodes in the cluster, and the usage rate is the sum of the usage of all containers to the CPU divided by the number of CPUs of all nodes in the cluster:

It can be seen that the overall CPU allocation rate of the cluster is about 60%, but the actual CPU utilization rate does not exceed 10%. It can be understood that users spend 100 yuan on the cloud, but 90 yuan is actually wasted.

How to set up Request?

There are many ways to improve resource utilization. For details, see Kubernetes Cost Reduction and Efficiency Standard Guide | Resource Utilization Improvement Tool . This article mainly discusses the setting of Request.

Since the resource utilization rate is so low that the Request is set, can you just stop setting the Request, and then directly reduce the size of the cluster to one-tenth of the original, to solve the problem in the figure above? This indeed seems to be a simple and efficient method, but there are several more serious problems:

Kubernetes will automatically with the quality of service QoS . For Pods that have not set the Request value, they are easier to be evicted when resources are tight, and business stability is affected.
The sorting resources of the cluster is actually not a complete whole. The cluster is composed of many nodes. The actual CPU and memory resources are the attributes of the nodes. The capacity of each node has an upper limit, such as a 64-core CPU. For a large business, a cluster with thousands of cores or even tens of thousands of cores may be required. In this way, the number of nodes in the cluster will increase. use.
The business itself may fluctuate greatly. For example, the subway system is busy during the day and idle at night. The fixed Request value must be set in accordance with the peak value. At this time, waste is still prominent.

It can be seen that the setting of Request has always been a big problem for operation and maintenance development. If the Request is set too small, it will easily affect the performance of the business runtime. If it is set too large, it will inevitably cause waste.

Request smart recommendation

Is there an effective tool that can automatically recommend or even set the Request value based on the characteristics of the business itself?

This undoubtedly greatly reduces the burden on development, operation and maintenance. In order to solve this problem, TKE Cost Master launched the Request smart recommendation tool. Users can access the corresponding recommended values through the standard Kubernetes API (for example: /apis/recommendation.tke.io/v1beta1/namespaces/kube-system/daemonsets/kube-proxy).

After the function is activated, the relevant components recommended by Request will pull from Tencent Cloud Monitoring (Prometheus, InfluxDB, or third-party cloud vendors) in the cluster. For the memory monitoring index, calculate the corresponding P99 value and multiply it by a safety factor (for example: 1.15), as the recommended Request.

Regarding Limit, the Limit recommended by the Request smart recommendation function is calculated by the ratio of the Request and Limit set by the initial Request smart recommendation function. For example, the initial setting of the CPU Request value is 1000m, the Limit is 2000m, and the ratio of Request to Limit is 1:2. If the newly recommended CPU Request value is 500m, the recommended Limit is 1000m.

For more information about the use of Request smart recommendation, please refer to: Request smart recommendation product document .

Request It is recommended to refer to the historical resource consumption peak of the application and give a relatively "reasonable" and "safe" resource request value, which can greatly alleviate the waste of resources or business instability caused by unreasonable business Request settings.

For example, applying the Request recommendation in the following cluster, the business resource usage is about 10 cores, but the manually configured Request is 60 cores. In fact, it is enough to set the Request to 17 cores, and the utilization rate has increased from the previous 16.7% (=10/ 60) The increase to 58.8% (=10/17), an increase of 252% (=(58.8-16.7)/16.7), and a CPU saving of 71.7% (=(60-17)/60).

AHPA

Of course, Request intelligent recommendation is not a silver bullet, because the resource consumption of applications is not static. A large number of applications have tidal phenomena, and there is a gap of several times or even dozens of times in the resources required for business peaks and valleys. The Request, which is set based on the peak resource demand, makes the business occupy a large amount of unused resources during the idle period, resulting in the application's average resource utilization rate is still not high. At this point, if you want to do further optimization, you need to use elastic scaling.

At this stage, HPA is the most commonly used elastic tool in the Kubernetes field. Although HPA can solve the problem of elastic use of periodic business traffic resources to a certain extent, HPA has a lag. Specifically: HPA usually needs to define monitoring indicators first, such as CPU utilization 60%, and then related monitoring components monitor that the load pressure increases, and the utilization threshold is reached, HPA will expand and shrink the number of copies.

By observing the actual applications of a large number of internal and external users running on Tencent Cloud, we found that the resource usage of many businesses is cyclical in time series, especially for those businesses that directly or indirectly serve "people" . This cycle is determined by the regularity of people's daily activities. E.g:

People are used to ordering takeout at noon and evening
Morning and evening are rush hours
Even for services without obvious patterns, such as search, the number of requests at night is much lower than during the day

For applications related to this type, it is a natural idea to infer the next day's metrics from historical data in the past few days, or to infer the traffic for next Monday from the data of last Monday. By predicting future indicators, you can better manage application instances, stabilize the system, and reduce costs at the same time.

CRANE is the technical base of TKE's cost masters, focusing on optimizing resource utilization through a variety of technologies, thereby reducing users' cloud costs. The Predictor module in CRANE can automatically identify the periodicity of various monitoring indicators (such as CPU load, memory usage, request QPS, etc.) applied in the Kubernetes cluster, and give a forecast sequence for a period of time in the future. On this basis, we have developed AHPA (advanced-horizontal-pod-autoscaler), which can identify applications suitable for horizontal automatic scaling, develop scaling plans, and automatically perform scaling operations. It uses the native HPA mechanism, but it is based on predictions and proactively expands applications in advance instead of passively reacting to monitoring indicators. Compared with native HPA, AHPA eliminates the problem of manual configuration and automatic scaling lag, completely liberating operation and maintenance. mainly has the following characteristics:

Reliability—guarantee scalability and availability
Responsiveness-fast expansion and rapid response to high loads
Resource efficiency-reduce costs

The following figure shows the actual operation effect of the project:

The red line is the actual resource usage of the workload
The green line is to predict the resource usage of the workload
The blue line is the resource usage of the given elastic recommendation

CRANE and AHPA will be open source soon, so stay tuned.

For more information about the cost optimization principles and actual cases of cloud native, please refer to the "Source of Cost Reduction-Cloud Native Cost Management White Paper", which is a system proposed by Tencent based on the best practices of internal and external cloud native cost management and combined with industry outstanding cases. An optimized methodology and best practice path for cloud native cost optimization. It aims to help enterprises improve the cost of using cloud and give full play to the efficiency and value of cloud native.

For more details of the white paper, please download and learn about it in the [Tencent Cloud Native] official account by replying to the "white paper".

about us

For more cases and knowledge about cloud native, please follow the public account of the same name [Tencent Cloud Native]~

Welfare:

①Respond to the backstage of the official account [Manual] to get "Tencent Cloud Native Roadmap Manual" & "Tencent Cloud Native Best Practices"~

②The public account backstage reply [series], you can get the "15 series of 100+ super practical cloud native original dry goods collection", including Kubernetes cost reduction and efficiency, K8s performance optimization practices, best practices and other series.

③The official account backstage reply [white paper], you can get "Tencent Cloud Container Security White Paper" & "Source of Cost Reduction-Cloud Native Cost Management White Paper v1.0"

[Tencent Cloud Native] Yunshuo new products, Yunyan new technology, Yunyou Xinhuo, Yunxiang information, scan the QR code to follow the public account of the same name, and get more dry goods in time! !

Smart Request recommendation, K8s resource utilization increased by 252%

author

Why are resource utilization rates so low?

How low is the actual utilization of resources?

How to set up Request?

Request smart recommendation

AHPA

about us

账号已注销

引用和评论

Serverless AI绘画技术沙龙【深圳站】火热报名中

DeepSeek 从热潮到应用，腾讯云携手行业专家共探 AI 下一步

2025免费云服务器盘点

信息安全风云录，AI 时代安全江湖如何见招拆招？

腾讯云TVP AI与安全高峰论坛圆满落幕，共探大模型时代的安全破局之道

腾讯云cos大文件上传服务端实现一篇搞定

具身智能全解读，从实验室到产业化 | TVP技术夜未眠