kubernetes Standard Guide for Cost Reduction and Efficiency Improvement｜Understanding Flexibility, Application Flexibility

A brief description of elastic scaling in the field of cloud computing

Elastic scaling, also known as automatic scaling, is a common method in cloud computing scenarios. Elastic scaling can flexibly expand or shrink the server according to the load on the server and according to certain rules.
The meaning of elastic scaling in different scenarios:

For companies whose services run in self-built computer rooms, elastic scaling usually means allowing some servers to go to sleep at low loads, thereby saving electricity bills (and water and water bills for cooling machines).
For companies that use computer rooms hosted on the cloud, automatic scaling may mean lower costs, because most cloud providers charge based on total usage rather than maximum capacity.
Even for companies that cannot reduce the total computing power to run or pay at any given time, they can reduce the load on their servers during low traffic.
Elastic scaling solutions can also be used to replace instances of abnormal states to prevent hardware, network, and application failures to a certain extent.
In situations where production workloads are constantly changing and unpredictable, elastic scaling can provide longer uptime and higher availability.

Quoted from: https://en.wikipedia.org/wiki/%E5%BC%B9%E6%80%A7%E4%BC%B8%E7%BC%A9

Three key elements of elastic scaling

1. Based on what characteristics and attributes

Elastic scaling, as the name implies, a mechanism can allow certain objects to be expanded and contracted flexibly. In cloud computing and container-related fields, there are also more elastic scaling capabilities. There are elastic expansion and contraction based on system load, elastic expansion and contraction based on business logs, and elastic expansion and contraction based on resource pre-application. . The most commonly used records are as follows:

Scale objects based on system load indicators

Usage scenario: When your application takes on more load, it often needs more CPU and memory resources. At this time, you can set a CPU and memory utilization index, and the system will automatically set the number of copies to dynamically bear different loads. To prevent resource waste due to low resource utilization or excessive load when the application cannot afford it.
Limitation: Sometimes the application load becomes high but the utilization of CPU and memory is not very high. At this time, scaling based on system load indicators is invalid. And which system load index to use, and the threshold setting of utilization rate are more experienced.

Scale objects based on business logs

Usage scenario: The business log is specially recorded and stored, and the actual load situation of the current application can be obtained through log analysis. At this time, the capacity can be automatically expanded and contracted according to the business log.
Restrictions: You need to have log storage and analysis tools; the amount of log information is generally very large, and the elastic scaling based on the log is prone to misjudgment and missed judgment.

Scaling objects based on resource requests

Usage scenario: When some applications are not suitable for horizontal expansion and contraction, you can adjust the amount of resource requests to achieve expansion and contraction. Compared with method 1, the number of copies is expanded to achieve horizontal expansion and contraction. At this time, the expansion is the amount of resource requested by the container, which belongs to vertical expansion and contraction.
Limitations: The current method requires the reconstruction of the container, which may cause service interruption; and the vertical expansion depends on the capacity of the node that the current container is running. If the node itself has no remaining resources, vertical expansion cannot be achieved.

Scaling objects based on events

Usage scenarios: For example, when your business needs to process tasks in the Kafka message queue, each additional topic in Kafka needs to generate a new copy to process this topic; or each additional task data in the database will automatically generate a new one A copy to carry this task.
Restriction: It is completely dependent on the triggering of the event, but the processing time of the event itself can be long and different, and the load level can be high or low. The identical copy cannot respond flexibly.
Of course, you can also use other features and attributes to expand and shrink the objects, and not all of them are enumerated here. The specific business uses elastic scaling, and different features and attributes are selected as needed. Features and attributes are the basis of elastic scaling.

2. According to what strategy

After obtaining the data based on the above-mentioned characteristics and attributes, certain strategies and judgment rules are needed. In summary:

Under what circumstances and boundaries are the above-mentioned features and attributes or expansion, how much, what object, and how to expand?
Under what circumstances and boundaries are the above features and attributes to be reduced, how much, what object, and how?

Take an example of kubernetes Cluster AutoScaler:
The node shrinkage strategy in Tencent Cloud Container Service:

CA (Cluster Autoscaler) monitors that the utilization (take the maximum value of CPU utilization and MEM utilization) is lower than the set node. When calculating the utilization, you can set the Daemonset type to not count the resources occupied by the Pod.
The CA judges whether the cluster status can trigger scaling, and the following requirements need to be met:
- Node idle time requirement (10 minutes by default).
- Buffer time requirement for cluster expansion (10 minutes by default).
CA judges whether the node meets the shrinking conditions. You can set the following non-scaling conditions as needed (nodes that meet the conditions will not be scaled down by CA):
- A node with local storage.
- Nodes containing Pods that are not managed by DaemonSet under the Kube-system namespace. Description:
CA evicts the Pod on the node and then releases/shuts down the node (the node will not be processed with annual subscriptions).
- Completely idle nodes can concurrently shrink (the maximum number of concurrent shrinks can be set).
- The non-fully idle nodes are scaled down one by one.

The above is the processing logic of Kubernetes for node scaling, which is the scaling strategy part of the three key elements of elastic scaling. In summary, strategy is the most critical part of determining whether the capabilities related to elastic scaling are sufficient to match the business scenario.

3. What object to shrink

The service objects of elastic scaling on cloud service providers are often the number of servers, and there are more elastic scaling objects such as: the resource configuration (CPU/memory) of cloud servers, Pods in Kubernetes that carries user business, and It can be a cloud product that other companies need to use. As long as the business has an on-demand use of cloud resources, the resources that are available on demand can become objects of elastic scaling. The essence and purpose of elastic scaling on the cloud is to pay on-demand for elastic scaling objects.

The relationship between elasticity and cloud computing costs

What costs can be reduced by elastic scaling

Tencent Cloud's cloud native team plans to release a cloud native white paper in the future, which will introduce the experience summary of 1000+ enterprises in terms of cost. The whole is divided into three parts: understanding cost -> controlling cost -> optimizing cost. The use of cloud elastic scaling is one of the three major methods for companies to optimize costs.

1. Elastic scaling can reduce the cost of IT equipment

Through Efficiency | Analysis of the Phenomenon of Resource Utilization of Containerized Computing" , making full use of elastic scalability is one of the key points to improve resource utilization and reduce resource costs. Compared with the unused elastic scaling Under the circumstances, the overall resource utilization can be increased by 20% or more than 30%.
The Tencent Cloud native team proposed the level 2 of the containerized resource utilization maturity model, which is the business use of containers and cloud elastic scaling capabilities, combined with Kubernetes' HPA, VPA, CA and other capabilities, peak expansion and idle scaling, greatly improving resources Utilization rate.

2. Elastic scaling can provide operation and maintenance efficiency and reduce personnel input costs

When elastic scaling is not used, operation and maintenance personnel may encounter the following scenarios:
● Sudden increase in business or CC attack causes insufficient number of machines, resulting in unresponsiveness of your service
● Estimate resources based on peak visits, and visits rarely reach the peak in normal times, resulting in a waste of investment resources
● Manual guarding and frequent handling of capacity alarms, requiring multiple manual changes

The use of elastic scaling and automatic configuration can not only release the personnel's investment cost for manual resource changes, but also further improve the stability of the business.

Quote from: https://cloud.tencent.com/document/product/377/3154

Elastic scaling affects the key points of cost

1. The key points that elastic scaling affects the cost of IT resources

1. 1 Sensitivity

Sensitivity can be measured by the time from triggering the expansion and contraction to the actual completion of the expansion and contraction of the object. The shorter the time, the higher the sensitivity.
The increase in sensitivity not only affects the IT resource cost of the time difference for the business, but may also play a key role in certain business scenarios.
Sensitivity can be improved in multiple dimensions from HPA expansion speed, CluterAutoscler expansion speed, and business expansion methods.
Sensitivity is a key evaluation indicator for the elastic scaling function of Tencent Cloud container series products. From the basic layer, the speed of elastic scaling is mainly considered. Taking node expansion efficiency as an example, the actual test data of the time when TKE expands nodes through the node pool are as follows:

Test program:

Create a TKE cluster and expand 50, 100, and 200 nodes respectively
Record the time from the start of the batch expansion to the completion of the initialization
Release the newly created node
Repeat the test 5 times and record each batch expansion time

Add 50 nodes in batch:

-	1st	2nd	the 3rd time	4th	5th
time consuming	3 minutes 16 seconds	3 minutes 33 seconds	3 minutes 59 seconds	4 minutes 5 seconds 3	3 minutes and 13 seconds

Add 100 nodes in batch to add 200 nodes in batch:

-	1st	2nd	the 3rd time	4th	5th
time consuming	4 minutes 55 seconds	5 minutes 07 seconds	5 minutes 02 seconds	5 minutes 11 seconds	5 minutes and 10 seconds

Of course, from actual business needs to trigger expansion and contraction to business load Ready, at the Kubernetes service level, it is not only a part of node expansion, but also involves Pod HPA, monitoring or log indicator collection and analysis efficiency, etc. Tencent Cloud Container Service series products also It will continue to build the capacity of elastic expansion products around improving the sensitivity of elastic expansion.

1.2 Accuracy

Accuracy in the field of elastic scaling mainly means: scaling at the correct time, the number of scaling is accurate, and the attributes of the scaling object are accurate (such as the model of cloud server). The higher the accuracy, the more suitable the business. , Expansion will not be too large to cause waste of costs, nor will it be too small to cause business problems not to be solved, and too much shrinking without shrinking will cause business failures, and will not shrink too much and cause waste of resources.
Accuracy is closely related to the strategy and algorithm of expansion and contraction.
The accuracy of the Kubenretes service is the same as the sensitivity, and it is also scattered on various flexible expansion and shrinking components. Taking HPA as an example, the accuracy is mainly represented by its default expansion and shrinking algorithm. For details, please refer to the Kubernetes official website:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

The default HPA expansion strategy can meet most scenarios, but there are more business scenarios, so there are also components that are familiar to the matching business and have higher accuracy to expand and shrink the Pod, such as:
● Business attributes are related to time. CronHPA (Tencent Container Service is an HPC function) is used to control more precise expansion and contraction time.
● Event-based automatic expansion and contraction of KEDA, by replacing indicator data sources to match business requirements such as offline computing scenarios.
● ......
It is believed that the community will also have more and more abundant components in the Pod-level expansion and contraction in the future, to adapt to the diverse scenarios of the business to improve the accuracy of elastic expansion.

2. The key points of elastic scaling affecting operation and maintenance costs

2.1 Degree of automation

If the degree of automation is to be referred to by a measurable value, you can consider choosing the time invested in IT resource management for operation and maintenance or development. The less time, the higher the degree of automation, the less time invested, and the unexpected investment The lower the labor cost. The time here can be further divided into the time spent on scaling IT resources and the time spent on IT resource maintenance such as failure replacement.
To increase the degree of automation of elastic scaling, understanding the basic working principles of elasticity is the most basic requirement. The working principles of several basic elastic scaling components under the Kubenetes service will also be detailed below.
On the basis of understanding the working principle of elastic scaling, companies often combine their own operation and maintenance platforms to integrate elastic scaling and become part of the operation and maintenance system to meet business demands. Therefore, automation also requires cloud service providers to have higher requirements for the configurability of elastic scaling and the ease of use of APIs. If readers use Tencent Cloud container service-related elastic scaling APIs, you are welcome to provide high-quality suggestions for the product.

2. 2 Observability

The reason why the observability of elastic scaling is regarded as a key point that affects the cost of operation and maintenance is because the current automation of Kubernetes elastic scaling cannot be completely separated from the operation and maintenance personnel. Good observable performance makes IT management responsible. The staff reduces the mental burden and makes the operation of the business more transparent. At the same time, tasks that cannot be handled by automation can be handled more quickly by humans.
Observability includes inventory and management of elastic scaling objects, basic system indicators of elastic scaling objects, monitoring of operating status, and fault alarms, etc.
Cloud vendors’ products, including Tencent’s cloud container series, will provide some basic observability product capabilities. You can also use community’s Grafana and other dashboard tools to build your own observability platform.

Whether all businesses are applicable to elastic scaling

Business expansion is relatively low-risk. The biggest impact is that expenditures may increase, but it is a safe thing for the business itself. But elastic scaling has not only expansion, but also shrinkage. After the business has been scaled down, can the capacity be expanded quickly for the next burst of traffic? Especially if the remaining resources are seized by other businesses, or the resources on the cloud are sold out, temporarily expanding the capacity is a relatively risky thing.
When there is a dependency between business applications, after one application is scaled up, should the other application be scaled up and down? Will there be a chain reaction? These are the risk points that may cause system failure.
The above-mentioned elastic scaling is based on many characteristics, attributes, strategies, and objects. Any method can be elastically expanded. Which one is the best and most suitable expansion method? It often requires very strong technical accumulation and experience, and it is difficult to automate.
Improper use of flexibility has caused bills to skyrocket. It is necessary to understand the working principle of elastic scaling in order to use elastic scaling more accurately, reduce business costs, and improve business stability. It is recommended to read the official documents or Git documents related to Kubernetes elastic scaling before using the Kubernetes elastic scaling capabilities.

· ClusterAutoScaler: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

· HPA: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

· VPA：https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler

Problems in the Kubernetes elastic field

Problems with sensitivity

Elastic scaling needs to monitor "changes" (this change refers to the characteristics and attributes of elastic scaling mentioned above) in order to perform elastic scaling of the "objects" to be operated according to the "strategies" formulated in advance. However, it often takes a long time from the increase in actual business traffic to the "change" in the load, to the monitoring of the load change by the monitoring component, and finally to trigger the elastic expansion and contraction to occur.

In addition, in order to ensure the high Pod QoS and prevent important Pods from being expelled by the Kubernetes scheduler, the user will set the same Request and Limit for the container. At this time, the user's actual resource usage rate is only 100% at most. Assuming that the user uses HPA and the threshold is set to 90%, the number of replicas can only be expanded to 1/0.9=1.11 times at most each time the capacity is expanded. If the traffic suddenly increases to the point where you must use twice the current amount of resources, that is, twice the number of copies, you need to expand 8 times to carry twice the traffic: (1 (1.1 8) = 2.14), it is obvious This expansion step is too many and the cycle is too long.

Time window setting, the current HPA controller has a time window for expansion and contraction respectively, that is, in this window, the number of target copies of HPA expansion and contraction will be kept in a stable state, where the expansion is 3 minutes, and the contraction Rong is 5 minutes. If the time window is set to be small, the number of replicas may change frequently and cause the cluster state to become unstable; if the time window is set to be large, the expansion and contraction response time is too slow, and it cannot effectively cope with sudden traffic.

Problems affecting accuracy

Expansion may fail, which may be fatal to sudden traffic scenarios. For example, resources on the cloud may be sold out, and expansion cannot be done at this time.

The current node expansion and contraction of Cluster Autoscaler mainly depends on the Pending situation of Pod, the data is too single, and the accuracy needs to be improved. And the Pending of Pod only looks at the allocated resource requests and limits, not the actual resource usage. For the business side, over-provisioning the Pod is a common practice, which affects the accuracy of elastic scaling.

There are multiple specifications of CVM in a cluster. Which CVM should be prioritized for expansion and shrinking. For example, shrinking large-sized nodes is likely to cause contention and starvation after container rescheduling, while shrinking small-sized nodes may lead to clusters. In the end, only large-scale nodes are left.

The degree of automation

The current methods of elastic scaling are not automated enough. Although automatic elastic scaling can be achieved in the end, it is still based on a large number of manual configurations in the early stage. These configurations require strong business experience and accumulation, as well as various aspects of Kubernetes. A deep understanding of elastic scaling.

Taking HPA as an example, currently TKE has supported five categories and a total of 30 different indicators. For more details, please refer to TKE auto-scaling indicator description . In addition, TKE also provides using custom indicators for elastic scaling method. How to choose so many indicators? Which indicator is the most suitable indicator for your business? What is the appropriate value of the indicator? How to set the change range of the number of copies? Here are the key factors that affect elastic scaling.

Observability problem

What kind of elastic expansion and contraction results are caused by what and when, for the existing monitoring system, still needs more effort. Because the existing monitoring system usually monitors a certain indicator, it can monitor the changes in the number of copies, the changes in the elastic scaling objects, the resource utilization, and even the event/log information, but Combining them organically, interconnection is a relatively difficult thing. The observability of current elastic scaling also requires manual aggregation and analysis of various monitoring data, which requires a high degree of customization, which is very important for operation and maintenance. It is still a cumbersome task for personnel.

Tencent Cloud Container Service Elastic Scaling Vision Introduction

We are committed to relying on the various elastic scaling services provided by the Tencent Cloud native team to help customers realize automated resource management, reduce human maintenance costs and resource waste, and improve elastic scaling sensitivity, accuracy, automation, and observability. For details, please refer to the previous article "Encyclopedia of Resource Utilization Tools" Standard Guide for Cost Reduction and Efficiency Improvement.

Readers are welcome to try it out and put forward your valuable suggestions.

A brief description of elastic scaling in the field of cloud computing

Three key elements of elastic scaling

1. Based on what characteristics and attributes

2. According to what strategy

3. What object to shrink

The relationship between elasticity and cloud computing costs

What costs can be reduced by elastic scaling

1. Elastic scaling can reduce the cost of IT equipment

2. Elastic scaling can provide operation and maintenance efficiency and reduce personnel input costs

Elastic scaling affects the key points of cost

1. The key points that elastic scaling affects the cost of IT resources

1. 1 Sensitivity

1.2 Accuracy

2. The key points of elastic scaling affecting operation and maintenance costs

2.1 Degree of automation

2. 2 Observability

Whether all businesses are applicable to elastic scaling

Problems in the Kubernetes elastic field

Problems with sensitivity

Problems affecting accuracy

The degree of automation

Observability problem

other questions

1. Elastic dimension

2. Deportation options

Tencent Cloud Container Service Elastic Scaling Vision Introduction

账号已注销

引用和评论

Serverless AI绘画技术沙龙【深圳站】火热报名中

K8s 小白入门｜从电影配乐谈起，聊聊容器编排和 K8s

数据库的下一场革命：S3 延迟已降至原先的 10%，云数据库架构该进化了

PostgreSQL@K8s 性能优化记

只需三步，就可以在KubeBlocks上集成和使用NebulaGraph集群啦！

在 ApeCloud （云猿生数据）实习是怎样的体验？跟行业大佬练技术修为的一年小记

容器化对数据库的性能有影响吗？