Interpretation of the ICDE&#39;21 paper at the top meeting: Using DAEMON algorithm to solve the problem of multi-dimensional timing anomaly detection

Abstract: This paper aims at the anomaly detection problem of multi-dimensional time series data, and proposes a deep neural network algorithm based on GAN and AutoEncoder, and has achieved the current State of the Art (SOTA) detection effect. The thesis is one of the key technical achievements of cloud database innovation LAB at the level of trajectory analysis.

This article is shared from Huawei Cloud Community " ICDE'21 DAEMON Paper Interpretation ", author: Cloud Database Innovation Lab.

Guide

This article (DAEMON: Unsupervised Anomaly Detection and Interpretation for Multivariate Time Series) is an article published at ICDE'21 by the Huawei Cloud Database Innovation Lab and the Data and Intelligence Laboratory of the University of Electronic Science and Technology of China. This article proposes a deep neural network algorithm based on GAN and AutoEncoder for the anomaly detection of multi-dimensional time series data, and has achieved the current State of the Art (SOTA) detection effect. ICDE is a category A international academic conference recommended by CCF, and it is one of the top academic conferences in the field of database and data mining. This paper is one of the key technical achievements made by Huawei Cloud Database Innovation LAB at the level of trajectory analysis.

1. Summary

With the advent of the IoT era, more and more time series data collected by sensors are stored in the database, and how to process these massive data to mine the value of it has become a popular research point in the academic and industrial circles in recent years. This paper studies the anomaly detection problem of multi-index time series data to diagnose the possible anomalies of the entity that generates the time series data.

The main contributions of this article are as follows:

The DAEMON algorithm is proposed. The algorithm is based on the self-encoder and the GAN structure. The self-encoder is used to reconstruct the input time series data. The GAN structure is used to constrain the intermediate output of the self-encoder and the reconstructed output of the self-encoder to make the self-encoder The training process of the structure is more robust and reduces overfitting.
This paper proposes a method of root cause location using the reconstruction results of multi-dimensional anomaly detection
The DAEMON algorithm can beat existing algorithms on the test data set

2. Background

3. Algorithm design

Fig. 1 DAEMON's network structure

A. Introduction to the algorithm structure

The overall network structure of the DAEMON algorithm is shown in Figure 1. It contains three network modules, the variational self-encoder G_AGA (including the encoder G_EGE and the decoder G_DGD. The encoder and decoder are used as two GANs at the same time. The generator in the structure), the GAN structure discriminator D_EDE corresponding to the encoder and the GAN structure discriminator D_DDD corresponding to the decoder.

The following briefly describes the specific functions of each network structure

B. Data preprocessing

Data cleaning: Use the spectral residual algorithm to first clean up the abnormal points that may exist in the training data set. In this way, VAE will learn the normal distribution of the time series more accurately.
Data normalization: This article uses the MINMAX normalization method to normalize the training and test data.

C. Offline training process

DAEMON's network consists of three modules, a variational autoencoder, and two discriminators of GAN structure. Since the GAN structure network requires asynchronous training, the DAEMON structure corresponds to three asynchronous training processes, and each training procedure corresponds to its own optimizer and loss function.

Each module is introduced below:

GAN structure 1: In GAN structure 1, the generator corresponds to the encoder part G_EGE of the variational autoencoder, and the discriminator corresponds to D_EDE. The purpose of this GAN structure is to constrain the distribution of the generator q( z)q(z). From the standard loss function formula of GAN, the loss functions of the generator and the discriminator can be deduced as

GAN structure 2: In GAN structure 2, the generator corresponds to the decoder part G_DGD in the variational autoencoder, and the discriminator corresponds to D_DDD. The purpose of this GAN structure is to further constrain the output of the autoencoder In order to allow the autoencoder to better learn the normal distribution of time series data. Similar to the above, the loss function of the generator and the discriminator is

Variational Autoencoder Module: Variational Autoencoder is used for data reconstruction, and its own loss function is defined by a norm distance between input and output

Notice. The loss function of the discriminator in GAN structure 1 and 2 only involves the discriminator itself. When training, you can directly use (1) and (3) for training, and the loss function of the generator and the variational autoencoder The loss function also involves a common module, that is, the variational autoencoder itself. Therefore, when training the autoencoder network, three loss functions are actually trained at the same time. The specific method is to make the weight of the three loss functions And is the loss function of the variational autoencoder, namely

When training offline, train according to formulas (1), (3), and (6) in sequence.

D. Online inspection process

After the online data W_{x_t}Wxt is input to the detector, the reconstruction W'_{x_t}Wxt′ is obtained, and then the detected point x_txt and the reconstruction of the detected point x'_txt′ are compared In order to obtain the abnormal score, that is

E. Root cause analysis

It can be seen from formula (7) that the anomaly score is actually obtained by the sum of the errors of each dimension. Therefore, when the root cause is located, it is directly found from S_{x_t}^jSxtj The index corresponding to the largest kk scores can be regarded as the position where the root cause may appear.

4. Experiment

4.1 Environment Settings

In the simulation, the author compared four commonly used and public time series anomaly detection data sets, namely SMD, SMAP, MSL, SWaT data sets. The following are the specific indicators of each data set.

The indicators compared by the author in the simulation are precision, recall and F1-score.

In terms of comparison algorithms, the author compared 8 existing algorithms. Among them, the VAE algorithm is the structure of DAEMON after removing the GAN structure. The purpose is to test the effectiveness of GAN constraints. In order to reflect the effectiveness and innovation of the GAN structure in this article, the author also compared two other anomaly detection algorithms GANomaly and BeatGAN that use the GAN structure. Secondly, OmniAnomaly is a well-known AIOps team in the industry. Pei Dan's team from Peking University published an anomaly detection algorithm on KDD.

The following table is the parameter settings announced by the author

4.2 Test results

The simulation comparison results are shown in the table below

It can be seen that DAEMON can achieve the effect of SOTA on the four public data sets.

4.3 Time consumption

At the same time, from the perspective of training time and detection time, the DAEMON algorithm can also reach the upper-middle level among the existing algorithms.

Fig. 2 Comparison of training and detection time

4.4 Root cause positioning

Finally, the author compared the accuracy of root cause positioning, DAEMON can also achieve the performance of SOTA in the comparison algorithm

Fig. 3 Comparison of root cause positioning accuracy

5. Application

This algorithm has been integrated in the GaussDB for Influx, a time series storage and analysis component of Huawei Cloud, and is used for abnormal detection and root cause location of monitoring indicators.

Fig. 4 DAEMON application scenario

6. Summary

In the paper, the author proposes a DAEMON algorithm based on variational autoencoder and GAN for the problem of multi-dimensional timing anomaly detection. After testing, the DAEMON algorithm can achieve the performance of SOTA on the public data set, and it can also reach the root cause of SOTA. ability. Secondly, DAEMON training and detection time efficiency can also reach the upper-middle level in the existing algorithms.

Huawei Cloud Database Innovation official website: 1613f050cbd866 https://www.huaweicloud.com/lab/clouddb/home.html

Click to follow and learn about Huawei Cloud's fresh technology for the first time~

Interpretation of the ICDE'21 paper at the top meeting: Using DAEMON algorithm to solve the problem of multi-dimensional timing anomaly detection

Guide

1. Summary

2. Background

3. Algorithm design