Recently, the paper "RobustScaler: QoS-Aware Autoscaling for Complex Workloads" by Alibaba Cloud Container Service Team and DAMO Academy Data Decision Team was accepted by ICDE 2022, the top international conference on data management and database. ICDE, SIGMOD and VLDB are also known as the three top international academic conferences in the field of databases, and were selected into the list of Class A international conferences recommended by the China Computer Federation (CCF).

在这里插入图片描述

Alibaba Cloud Container Service ACK manages a large number of Kubernetes clusters, has accumulated rich experience in cluster management, cluster operation and maintenance and other fields, and built an intelligent operation and maintenance platform CIS (Container Intelligence Service), aiming to solve the operation and maintenance through intelligent means problem. The data decision-making team of DAMO Academy has been working in the direction of time series analysis/forecasting/anomaly monitoring/AIOps for many years, dozens of articles have been published in KDD, SIGMOD, ICDE, AAAI and other top conferences and many Chinese and American patents, and won the 2022 ICASSP AIOps Challenge champion and many other international awards.

Nowadays, enterprise business traffic often presents obvious peaks and valleys. If a fixed number of instances is used, there will be a huge waste of resources. Configuring elastic scaling for applications is an effective way to improve resource utilization.

The existing elastic scaling policies in Kubernetes, such as HPA, CronHPA, etc., all have the problem of elastic trigger lag, which leads to the degradation of application service quality. On the premise of ensuring the quality of application service, how to expand and shrink the capacity in advance based on the historical data of the application and based on the time series algorithm?

To solve this problem, we propose RobustScaler, an intelligent elastic framework based on non-homogeneous Poisson process (NHPP) and stochastic constraint optimization. Furthermore, a specialized Alternating Direction Multiplier Method (ADMM) is developed to efficiently train the NHPP model, and it is demonstrated that an optimization-based active policy can guarantee the application's quality of service. Extensive experiments show that RobustScaler outperforms common autoscaling strategies in various real-world scenarios, and also performs well in applications with complex periodicity.

The RobustScaler algorithm has been applied to the AHPA component of the intelligent operation and maintenance platform CIS. The intelligent operation and maintenance platform CIS consists of four modules: exception discovery, exception location, exception repair, and exception prediction, including regular inspection, network diagnosis, runtime diagnosis, CVE vulnerability repair, application configuration optimization and many other functions. AHPA is one of the core components of CIS. The component architecture is shown in the figure below. AHPA resilience strategies can be divided into active prediction and passive prediction. Active prediction identifies cyclical trends from historical data, and proactively predicts the number of instances of the application in the next cycle; passive prediction sets the number of instances based on real-time application data, which can well cope with burst traffic. In addition, AHPA also adds a bottom-up protection policy, and users can set the upper and lower bounds for the number of instances. The number of instances that finally takes effect in the AHPA algorithm is the maximum value among active prediction, passive prediction and bottom-line strategy.

The AHPA component is in the public beta, click to apply for the whitelist [1], you are welcome to try it and give your valuable comments.

在这里插入图片描述

Click here to view the details of the Alibaba Cloud Container Service AHPA Elastic Prediction product documentation. At present, AHPA has opened the user invitation test. Interested users are welcome to click the "Submit Work Order" position in the document to apply for the whitelist, and look forward to your trial and feedback.

Related Links

[1] Apply for whitelist https://help.aliyun.com/document_detail/412229.html


阿里云云原生
1k 声望302 粉丝