Abstract: China Computer Society Big Data and Computational Intelligence Competition (CCF BDCI) Huawei Severless workload prediction runner-up solution and ModelArts experience sharing

This article is shared from the HUAWEI cloud community " Free ModelArts computing resources-won the CCF BDCI Huawei Severless workload prediction runner-up ", the original author: wyhhyw.

Introduction

Based on historical data and historical trends, the workload of the Severless software architecture is accurately predicted, which is convenient for optimizing resource adjustment and improving user service quality.

The data provides the usage of 43 queues for several days, including information such as CPU usage, disk usage, number of jobs submitted, and whether the jobs were successfully submitted. It is required to predict the CPU usage and the number of submitted jobs every five minutes in the next 25 minutes of the test set based on historical information. Question link https://www.datafountain.cn/competitions/468.

Problem analysis

This is a very typical time series regression problem. The goal is to predict the CPU usage and submitted jobs at five points in the future. The target can be modeled from the following perspectives.

  • trend fitting: fits the curve according to the usage rate and the number of jobs before the time point to be predicted, and gives a prediction, referring to models such as arima.
  • single label regression: uses 5 as the time interval to predict the target, for example, x1 -> x6, x2 -> x7. The details are shown in the figure below:
    image.png
  • multi-label regression: refers to the pandas.shift function to construct a smooth feature of historical information, and predict a target at a time point at a time. The schematic diagram is as follows:
    image.png

An Introduction

The author is fortunate to win the second place (second prize) of the competition. The modeling method is to integrate the prediction results of lightgbm and lstm. Among them, the lightgbm line is second and the lstm line is about 10th. Since the lstm structure adopted in this scheme is relatively simple and the results are not particularly ideal, and the third-placed lstm is the neural network model with the highest online performance among the final defenders, so this article also introduces the third-placed (same second prize) lstm Architecture.

data analysis

Before feature engineering and modeling, first come with a wave of EDA~

The following figure shows the distribution of CPU usage under different queues. It can be found that the data distribution under different queues is quite different, so the queue number is also a very powerful feature for predicting CPU usage.
image.png

The following figure shows the trend of the CPU usage of a certain queue number with the hour. It can be found that the usage rate is high in the afternoon to 3 am, so the hour and minute are also very powerful features. It should be noted that the contestant has desensitized the year, month and day of the timestamp, so only the hour and minute features can be used.
image.png

Feature engineering

Indispensable part, feature is king

  • sliding feature: uses pandas.shift function to construct its smooth feature for features such as CPU usage.
  • differential features: constructs differential features of various orders on the basis of smooth features.
  • based on the statistical characteristics of sliding window: opens a window on the basis of the smooth feature, and then slides, each time the statistical features such as the mean, variance, and maximum of the features in the window are taken.
  • aggregate statistical features: for example, features such as the mean and variance of CPU usage in different hours in historical data.
  • pseudo crossing feature: crossing feature is generally not allowed. For timing issues, pseudo-traversal features can be constructed, which is actually aggregate statistical features. For example, if the time point to be predicted is 9 AM, statistical features such as the mean value of 10 AM can be constructed based on historical data, and features such as the difference ratio can be derived.
    image.png

model

  • Modeling strategy: see the multi-label regression in the analysis of the competition. Generally speaking, this modeling method will achieve better results.
  • lightgbm: performs five-fold cross-validation on each label. The CPU usage and the number of submitted jobs each have five time points, and a total of ten five-fold cross-validation has been performed.
  • lstm: CPU usage is an integer from 0 to 100, data at five consecutive time points such as "10-21-41-31-34" can be regarded as the character index in nlp, so it can be directly used for index embedding The word vector in the lookup table, then the modeling naturally transitions to lstm.

The following figure shows the lstm architecture used in this article. The effect is not very satisfactory. The line is about tenth. After analyzing and discussing with teammates after the game, I think there are several problems with our framework. The first is to treat each CPU usage rate as a word vector, then the input_dim in lstm is 1, and the effect is definitely not good. The second is that our framework is very simple and does not introduce convolution or attention mechanisms.
image.png

Given below is the lstm framework of another team in the second prize. The framework consists of two parts:
(1) LSTM extracts the timing information of CPU usage and hard disk usage, and introduces an attention mechanism
(2) Fully connected extracts the information of other manual features, performs high-level crossover, and sets up cross-layer connections to form "integrated models of different scales".
image.png

training

  • loss: because the data fluctuates greatly, so it can be considered that there are certain outliers, so smooth l1 is used as the loss function.
  • computing power: lightgbm modeling requires less computing power, and 16G memory is sufficient. However, when using the above neural network training, because it is a multi-label regression (10 labels), a model needs to be trained for each label. If combined with five-fold cross-validation, then 50 models will be trained, which has a great deal of computing power. Certain needs.

The author only started training the nn model at the end of the schedule. However, I only have a toy graphics card gtx1650. Training dozens of models for this multi-label task is too time-consuming, and I have to look for computing resources. After the senior's recommendation, Huawei Cloud’s ModelArts was finally selected, with two hours of free computing power per day and V100, which feels very good. Finally, let the lab brothers open a few more accounts, and save the model while training. When the time is about to change, continue training and run the model in two or three days. Although the account needs to be switched due to the time limit, the overall experience is still good. Here is a brief introduction to the experience of ModelArts.

ModelArts experience

experience feelings

(1) Although it is trained in the cloud, ModelArts provides jupyter lab/notebook, just like uploading data to the notebook on the local PC and then writing the code, it is completely unaware of the changes in the training process. Different engines are integrated in the lab, such as pytorch, tf, and xgboost.
image.png
image.png

(2) It is very convenient to install the dependent package, directly in the cell'!pip install xxx', it seems that this function is not available in the native notebook. For example, my new notebook is a pytorch engine, but xgboost is required, so I can install it directly, as shown in the figure below.
image.png

(3) Currently there are restrictions on uploading data. Only a few hundred MB can be uploaded at a time. You can upload the data to the notebook first and then perform feature engineering. For relatively large data, you can segment it locally and upload it in batches. It is also an irrelevant question, after all, the free V100 is too fragrant.

There are some other details, everyone can explore on their own!

Key points-how to apply

What are you doing in a daze, click the link , and grab the hashrate!!! 160bdcb033ccc9 https://bbs.huaweicloud.com/forum/thread-51080-1-1.html

For more AI related data, algorithms, models and other AI assets, please click " learn more about ", AI Gallery is waiting for you!

Click to follow, and get to know the fresh technology of


华为云开发者联盟
1.4k 声望1.8k 粉丝

生于云,长于云,让开发者成为决定性力量