Kujiale&#39;s flexible landing practice based on Crane EHPA

Why do you need autoscaling

Due to the regularity of people's activities, Internet services are generally characterized by periodic busyness. The most common one is periodic changes in units of days, such as busy during the day and idle at night. According to the traditional fixed resource model, when the resource usage rate is low during the off-peak period, but the resource is still held, it will cause waste. The automatic scaling of services (which may also need to cooperate with the automatic scaling of cluster resources) can achieve the effect of expanding capacity during peak periods and recycling resources during low-peak periods, and realize cost optimization in scenarios such as public cloud billing.

In the traditional fixed resource model, the business often has redundant resources to cope with the increase in traffic. After using automatic scaling, this part of the redundant resources can be reduced, and the hot traffic can be more calmly dealt with.

After entering the K8S era, services can easily complete scaling operations through declarative APIs, and the horizontal scaling (Horizontal Pod Autoscaling, hereinafter referred to as HPA) function provided by K8S can be implemented based on indicators such as CPU, QPS, and task queues. Automatic scaling of the number of service instances.

This article mainly summarizes the problems in the implementation of HPA, and how Crane's EffectiveHPA (hereinafter referred to as EHPA) helps us solve the corresponding problems.

The problem of HPA landing

The operating mechanism of HPA is mainly to perform scaling activities by calculating the expected number of instances of the service through the calculation of indicators and scaling thresholds by the HPA Controller. On the one hand, the challenge brought by automatic capacity expansion and contraction is that higher requirements are placed on the service. The premise of expanding and contracting at any time is that the service can be started and offline gracefully; on the other hand, the function and robustness of the HPA infrastructure itself are required. challenge. The following mainly discusses the problems encountered by the latter.

Expansion lag : The first problem is that there is a lag in expansion and contraction based on monitoring indicators. When it is detected that the indicator reaches the threshold and the service is expanded and started, there will be a certain time difference. This causes the service to start to expand when the peak comes. At this time, if the expansion mechanism is abnormal (insufficient cluster resources, abnormal dependencies cause startup failure, etc.), it will cause a greater crisis.
Monitoring failures affect service stability : Because HPA automatically scales based on monitoring indicators, after scaling down during low-peak periods, if the monitoring system fails and cannot expand during peak periods, the stability of online services will be severely affected. big impact. In the past, failures of monitoring systems only affected the observability of services.
Instance number jitter caused by indicator glitches : The monitoring indicators of some services may not be smooth (especially system indicators such as CPU usage), and there may be intermittent glitches. When such indicators are applied to HPA, the number of service instances will fluctuate, resulting in unexpected repeated expansion and contraction. Although the number of jitters can be reduced by the HPA's stabilization time window parameter, the effect is limited.

How EHPA solves the problem

Advance capacity expansion through predictive algorithms

For services with periodic characteristics of indicators, the DSP prediction algorithm is used to achieve the effect of early expansion, so as to ensure that the expansion is completed before the peak period. DSP transforms the indicator from the time domain (that is, the time domain) to the frequency domain (that is, the frequency domain) based on the historical indicator data through Fast Fourier Transform, that is, converts a relatively complex function (the historical data curve of the indicator) into multiple simple ones. The superposition of functions, filtered and then reversely fitted to the future data of the indicator.

After obtaining the forecast data, EHPA will add a new forecast indicator (together with the original real-time indicator) to guide the automatic scaling of the service. The thresholds of the predicted indicators are consistent with the original indicators, and if one of them reaches the threshold, the expansion will be triggered. EHPA will use the maximum value of the forecast data for a period of time in the future (controlled by the PredictionWindowSeconds parameter) as the data source of the forecast indicator, so that the effect of early expansion can be achieved.

Downgrade using forecast data

When the monitoring data fails, if the forecast data is still valid at this time, the capacity expansion can still be completed through the forecast data. At present, the validity period of the prediction data is also controlled according to the PredictionWindowSeconds parameter in the previous section, and the validity period will not be too long. In addition, the indicators of some services do not have periodic characteristics, so the forecast data cannot be obtained, so the downgrade plan can only be said to be a certain procedural mitigation at present.

Mitigate Jitter with Smooth Predictors

The low-amplitude signal is filtered out before fitting the forecast data, which is equivalent to filtering out the burr part of the indicator curve, so the overall curve of the forecast indicator is relatively smooth. At the same time, the effect of overflowing the indicator data can also be achieved through the MarginFraction parameter of EHPA. In fact, the data is multiplied by (1+MarginFraction). In this way, the predictor can wrap more glitches and reduce the jitter of the number of service instances.

other

Crane's prediction function is a general function provided by TimeSeriesPrediction CRD, so it can predict any indicator according to the provided indicator calculation rules. As mentioned above, HPA often works at the same time in combination with cluster elasticity, so in addition to providing forecasts for the resource water level of the service, it can also provide forecast data for the water level of the cluster to achieve the effect of early expansion of node resources.

Overall structure

Crane EHPA uses a Proxy-like method to increase the prediction function and reduce the intrusiveness of the original architecture. The position in the overall architecture is shown in the upper left corner of the above figure.

The original elastic architecture mainly includes the following components:

HPA Controller: A controller in the KubeControllerManager process. It can cyclically monitor the hpa threshold configuration and the real-time value of the indicator data to determine whether expansion or shrinkage is required. It is the core component of automatic expansion and shrinkage.
PrometheusAdapter: Through a series of configuration rules, it is used as an adapter to connect K8S Apiserver and Prometheus. By registering to APIService through the K8S Aggregation API mechanism, the data in Prometheus can be obtained indirectly by querying the K8S Apiserver, which is used as the data source of custom indicators.
MetricsServer: Query the metric data source used by nodes or Pod real-time resources (CPU, memory, etc.), and the resource type threshold of HPA Controller will query this data source.
Prometheus: The data source for querying the historical data of various indicators. The Pods and External type thresholds of the HPA Controller will query the data source, and also provide historical data for the prediction function.

After the introduction of Crane EHPA, the following two components were mainly added:

Craned: Craned's EHPA Controller will listen to EHPA objects to create/update corresponding HPA objects and TSP objects. The HPA object is still managed by the HPA Controller, and Craned's Predictor module generates the forecast data of the corresponding indicator based on the configuration of the TSP object.
MetricsAdapter: Register to APIService instead of PrometheusAdapter, as the Proxy of PrometheusAdapter. When the data of the forecast indicator is to be read, the data will be returned according to the forecast data; otherwise, it will be forwarded to the PrometheusAdapter.

landing effect

Applied to the service that is busy during the day and idle at night, as shown in the figure above, it can be seen that the blue forecast data is almost consistent with the green real-time monitoring data, indicating the accuracy of the forecast data. At the same time, the yellow curve is the maximum value of the predicted data in the future PredictionWindowSeconds mentioned above, so the prediction index can also be used as one of the thresholds of HPA to achieve the effect of early expansion.

For services with many indicator glitches, it can be alleviated by smoothing the forecast data, and at the same time, you can define the DSP algorithm parameters to customize the effect. As shown in the figure above, the yellow curve is the predicted curve under the default parameters, and there are still some glitches (green curve) outside the predicted curve. By adjusting the MarginFraction parameter, the prediction curve can continue to overflow, which is the effect of the blue curve in the figure.

Summarize

The above mainly introduces the advantages brought by EHPA, but the author also encountered some limitations and optimization points when landing.

At the time of the author's landing, the prediction function of EHPA can only support CPU indicators, so the author implements it through his own Fork code modification. In version 0.6.0, the prediction of custom indicators has been supported, and the community version can be used directly in the future.
The current DSP algorithm mainly supports the regular processing of periodic services. Therefore, for scenarios such as holiday promotions, the capacity can be expanded in advance through the Cron function of EHPA, and the monitoring indicators of the application can be used to get the bottom line.

Derivative Reading: What is Crane

In order to promote cloud native users to achieve the ultimate cost reduction on the basis of ensuring business stability, Tencent launched the industry's first Kubernetes-based cost optimization open source project Crane (Cloud Resource Analytics and Economics). Crane follows FinOps standards and aims to provide cloud-native users with a one-stop solution for cloud cost optimization.

Effective HPA is Crane's horizontal resiliency tool to help users on their journey to intelligent resiliency.

Crane has successfully joined CNCF Landscape. Welcome to pay attention to the project and cooperate with it: https://github.com/gocrane/crane .

[Tencent Cloud Native] New products of Yunshuo, new techniques of Yunyan, new activities of Yunyou, and information of cloud appreciation, scan the code to follow the public account of the same name, and get more dry goods in time! !

Kujiale's flexible landing practice based on Crane EHPA

Why do you need autoscaling

The problem of HPA landing

How EHPA solves the problem