Abstract: This paper proposes a graph convolutional network architecture based on local feature retention. Compared with the latest comparison algorithm, the method has greatly improved the graph classification performance on multiple data sets, and the generalization performance has also been improved. improve.
This article is shared from HUAWEI CLOUD Community " Paper Interpretation: Graph Convolutional Neural Network Architecture (LPD-GCN) Based on Local Feature Preservation", the original author: PG13.
In recent years, many researchers have developed many methods based on graph convolutional networks for graph-level representation learning and classification applications. However, the current graph convolutional network method cannot effectively retain the local information of the graph, which is particularly serious for graph classification tasks, because the goal of graph classification is to distinguish different graph structures based on the graph-level representation it learns. In order to solve this problem, this article proposes a graph convolutional network architecture based on local feature preservation [1]. Compared with the latest comparison algorithm, the graph classification performance of this method on multiple data sets has been greatly improved, and the generalization performance has also been improved.
1 Introduction
Graph (network) structure data can capture the rich information between entities and entities by modeling the edges between nodes and connecting nodes in the graph. Graph structure data has been widely used in many research fields, including biology (protein-protein interaction network), chemistry (molecular structure/compound structure), social science (social network/document citation network) and many others Research areas. Graph structure data can not only efficiently store structured information, but also play an extremely important role in modern machine learning tasks. Among many machine learning tasks, graph classification is an important task that has been extensively studied in recent years. The purpose of graph classification is to classify a given graph into a specific category. For example, in order to distinguish the various graph structures of organic molecules in chemistry, it is necessary to infer and aggregate the entire graph topological structure (the topological structure in a molecular network consists of a single atom and its direct bonds) and node characteristics (such as atomic properties), and Use inferred and aggregated information to predict the category of the graph.
In recent years, many technologies aimed at solving the problem of graph classification have been published internationally. A traditional and popular technique is to design a graph kernel function to calculate the similarity between graphs, and then input it into a kernel function-based classifier (such as SVM) to perform graph classification tasks. Although the method based on graph kernels is effective, there are computational bottlenecks, and the process of feature selection is separate from the subsequent classification process. In order to solve the above challenges, the end-to-end graph neural network method has attracted more and more research attention. Among them, graph convolutional neural networks (GCNs) are the most popular graph neural network method to solve graph classification problems.
The current graph convolutional neural network roughly follows the Message Passing Neural Network (MPNN) framework [2]. The framework consists of two parts: the message transfer phase and the readout phase. The message transfer phase updates the feature vector of each node by gathering the neighboring features of the node, and the readout phase generates the entire graph through the global pooling module. Level features. The graph convolutional neural network uses the message passing function to iteratively run the graph convolution operation, so that the feature information can be propagated for a long distance, so that it can learn the neighborhood features of different ranges. After k times of graph convolution operations, useful node or edge features can be extracted to solve many node and edge-based analysis tasks (for example, node classification, link prediction, etc.). In order to solve graph-level tasks (such as graph classification), the read-out module needs to aggregate all nodes or local structure information to generate graph-level representations. The following figure shows the general framework of graph convolutional neural networks for graph classification tasks. Based on the existing message transfer framework, many researchers have developed many variants of graph convolutional neural networks with various message transfer functions, node update functions, and readout modules.
However, the main limitation of the existing graph-based convolutional neural network method is that the graph-level representation learning method for graph-level representation learning lacks effective use of local feature information. In other words, they overemphasize the ability to distinguish between different graph structures, while ignoring the local expression ability of nodes, which easily leads to excessive smoothing (the feature representation of each node tends to be consistent), especially when deepening the neural network layer As you count, the over-smoothing problem will become more and more serious. This is because in the process of local neighborhood aggregation, the feature information of the neighborhood is not effectively distinguished and distinguished, so that the local expression ability of the learned node features is not strong, and the effect of excessive smoothness greatly limits the overall situation. The ability to represent the graph-level features.
As we all know, graph-level representation is obtained by aggregating the local features of nodes, so how to maintain the local expression ability in the optimization process is the key prerequisite for improving the graph representation ability. For the graph-level representation learning goals, the existing research methods for maintaining the local expression ability of features can be roughly divided into three factions: (1) designing different graph convolution operations and readout operations, (2) designing hierarchical aggregation Class methods, (3) Explore new model architectures. In the first faction, Xu et al. found that the graph-level representation learned based on methods under the existing messaging framework could not effectively distinguish different graph structures, and they proposed a graph isomorphic network model (GIN)[ 3]. The graph isomorphic network uses an injective aggregation update method to map different node neighbors to different feature vectors. In this way, the local structure and node characteristics of the graph can be preserved, making the graph neural network as effective as the Weisfeiler-Lehman test. Fan et al. proposed a structured self-attention architecture similar to graph attention networks (GATs) [4] for graph-level representation learning, where the node-centric attention mechanism will have different learnable weights The features of neighbor nodes are aggregated together, and the hierarchical attention mechanism and graph-level attention mechanism are used as the readout module of the model, which can aggregate important features from different nodes and different depths into the output of the model. In the second faction, that is, in the hierarchical clustering method, a lot of research work proves that in addition to the dichotomy between nodes or graph-level structures, graphs also show other rich hierarchical structures. For example, a recent frontier work proposed DIFFPOOL [5], which is a differentiable hierarchical pooling method that can be jointly trained with graph convolution and can be used to extract local feature information.
All in all, the above two types of methods for graph classification tasks can fit most training data sets well, but their generalization ability is very limited, and the effect on the test set is mediocre, and it is difficult to break through the bottleneck of the existing methods. In the third type of faction, that is, to study new model architectures, some researchers try to solve the actual difficulties or excessive smoothing problems in the existence of training graph convolutional neural networks. For example, Xu et al. [6] proposed a jumping knowledge network (JK-Net) architecture to connect the last graph convolutional layer of the network with all previous hidden layers, which is similar to the structure of the residual network . Through this design, the last layer of the model can selectively use the neighborhood information from the previous different layers, so that the node-level representation can be well captured in a fixed number of graph convolution operations. Especially as the depth of the network increases, the effect of residual connections on the model is more prominent. This jump structure has been proven to significantly improve the performance of the model on node-related tasks, but few researchers have explored their effectiveness on graph-level tasks (such as classification in the figure). In GIN, Xu et al. further proposed a model architecture similar to JK-Net for learning graph-level representation. In this architecture, a readout layer is connected behind each convolutional layer to learn graph-level representations of different depths, and then the graph-level representations of different depths are connected together to form the final representation. This readout architecture considers all depth of global information and can effectively improve the generalization ability of the model.
2. Graph Convolutional Neural Network (GCN)
(1) Problem definition
Given an undirected graph G = {V, E}, V represents the set of nodes, and E represents the set of edges. In addition, Xv is used to represent the initial characteristics of each node. The goal of graph convolutional neural networks is to learn continuous representations of arbitrary graph instances to encode node features and topological structures. Suppose that a set of graphs with M labels G = {G1, G2, ... ,GM} and the corresponding label Y = {y1, y2, ... ,yM} for each graph are given, and the graphs are classified The goal is to use them as training data to build a classifier gθ that can assign any new graph input G to a specific category yG, that is, yG = gθ(hG).
(2) Graph Convolutional Neural Network
GCNs consider both the structural information of the graph and the feature information of each node in the graph to learn node-level and/or graph-level feature representations that can best help complete the final task. Generally speaking, the existing GCN variants converge first
Combine the neighborhood information, and then combine the generated neighborhood representation with the center node representation of the previous iteration. In terms of formula, GCN iteratively updates the representation of the node according to the following formula:
among them
It represents the feature representation of node v at the kth iteration. AGGREGATE() and COMBINE() are both learnable information transfer functions of the k-th graph convolutional layer. N(v) represents the set of neighboring nodes of node v. Generally, after K iteration steps, the final node can be represented
Applied to node label prediction, or proceed to the readout stage of performing graph classification. In the readout stage, the feature vector hG is calculated for the entire graph by using some specific readout function READOUT() by aggregating node features:
The READOUT() function can be a simple replacement invariance function, such as a summation function; it can also be a graph-level pooling operation, such as DIFFPOOL, SORTPOOL.
3. Method introduction
In order to solve the problem of insufficient local information retention and generalization ability of existing methods, this article has improved the loss function and model architecture, and proposed the model LPD-GCN. As we all know, GCNs learn the graph-level representation of the entire graph by using graph topology and node characteristics. From the perspective of loss, in order to make full use of and learn the feature information of nodes, LPD-GCN constructs additional local node feature reconstruction tasks to improve the local representation ability of hidden node representation and enhance the discriminative ability of final graph-level representation. That is, an additional auxiliary constraint is added to retain the local information of the graph. This node feature reconstruction task is achieved by designing a simple but effective encoding-decoding mechanism, in which stacked multiple graph convolutional layers are used as encoders, and then a multi-layer perceptron (MLP) is added for Subsequent decoding. In this way, the input node features can be embedded into the hidden representation through the encoder, and then these vector representations can be input into the decoder to reconstruct the initial node features. From the perspective of model architecture, we first explored and designed a densely connected graph convolution architecture to establish the connection relationship between different layers, so as to flexibly and fully utilize the information from the neighborhood of different locations. Specifically, each convolutional layer and its corresponding readout module are connected to all previous convolutional layers.
(1) Node feature reconstruction based on encoding-decoding mechanism
Traditional GCN's graph-level representation ability and discriminative ability are limited by over-refinement and globalization, and ignore the preservation of local features, which will lead to over-smoothing problems. LPD-GCN contains a simple encoding-decoding mechanism for realizing local feature reconstruction, where the encoder is composed of stacked multi-graph convolutional layers, and the decoder uses multi-layer perceptrons to reconstruct local node features. At the same time, an auxiliary local feature reconstruction loss is constructed to assist the target of graph classification. In this way, node features can be effectively retained in hidden representations on different layers.
(2) Neighborhood aggregation based on DenseNet
In addition, in order to be able to flexibly use the information from the neighborhood of different layers, the model adds direct connections from each hidden convolutional layer to all higher-level convolutional layers and readout modules. Such an architecture is roughly the corresponding structure of DenseNets. As we all know, DenseNets are proposed for computer vision problems. This architecture allows selective aggregation of neighborhood information at different layers and further improves the flow of information between layers. In DenseNets, a hierarchical concatenation feature aggregation method is used. LPD-GCN adopts the feature aggregation method of layered accumulation.
(3) Local node representation based on global information perception
After introducing the auxiliary local feature reconstruction module, each convolutional layer can receive additional supervision to maintain locality. However, such supervision information cannot be used to train these global readout modules through backpropagation. In the architecture of the model in this chapter, there is a corresponding global readout module behind each convolutional layer to embed and collapse the entire graph node into a graph-level representation. So, how can we make better use of the supervision information from local feature reconstruction? In order to solve this problem, a direct connection from each readout module to the next layer of convolution module is added, and the node-level feature is aligned with the global graph-level feature in a series connection. That is, using point-by-point concatenation, each node representation and graph-level representation are connected into a single tensor. In addition, a learnable parameter ε (> 0) is introduced to adaptively trade-off between local node-level representation and global graph-level representation.
among them
By designing such an architecture, in addition to the gradient information generated by the loss of the main image-level tasks, other gradient information can be backpropagated due to the loss of local feature reconstruction to update the read parameters, thereby reducing the loss of local representation The risk of ability and improve the generalization ability of the model. At the same time, the node representation is combined with the additional global context to form a global context-aware local representation, which can also enhance the representation of the node.
(4) Global hierarchical aggregation based on self-attention mechanism
Most existing methods feed the node representations learned by multiple graph convolutional layers to a global readout module to generate graph-level representations, and the readout module generates global graph-level features through pooling or summation. However, as the network depth increases, the node representation may appear too smooth, resulting in poor overall performance of the graph-level output. In order to effectively extract and utilize global information of all depths, the model in this chapter further adopts a self-attention mechanism, which reads hierarchically graph-level features in a manner similar to GIN. The intuition of introducing a layer-centric self-attention mechanism here is that when generating the graph-level output of a specific task, the different attention weights assigned to each layer can be adapted to the specific task.
(5) Loss function
In the training phase, the model LPD-GCN in this chapter receives gradient information from the main task of graph classification and auxiliary local feature reconstruction constraints. From the formula, through the total loss defined in the following formula (from the graph classification loss
Lost and local feature reconstruction loss weighted) to train LPD-GCN.
Which means
Graph classification loss,
Represents the loss of local feature reconstruction, and the trade-off parameter is adaptively introduced to seek a balance between the two loss items.
4. Figure classification experiment results
(1) Test data set
This article uses 8 commonly used graph data sets in the field of graph neural networks, performs 10-fold cross-validation to evaluate performance, and reports the mean and standard deviation of test accuracy.
(2) The effect on the test set
The classification performance on multiple data sets has been significantly improved, and the generalization ability has been improved.
5. References
[1] WENFENG LIU, MAOGUO GONG, ZEDONG TANG A. K. QIN. Locality Preserving Dense Graph Convolutional Networks with Graph Context-Aware Node Representations. https://arxiv.org/abs/2010.05404
[2] GILMER J, SCHOENHOLZ S S, RILEY P F, et al. Neural message passing for quantum chemistry[C] // Proceedings of the 34th International Conference on Machine Learning : Vol 70. 2017 : 1263 – 1272.
[3] XU K, HU W, LESKOVEC J, et al. How powerful are graph neural networks?[C] // Proceedings of the 7th International Conference on Learning Representations. 2019.
[4] VELI ˇ CKOVI´C P, CUCURULL G, CASANOVA A, et al. Graph attention networks[C] // Proceedings of the 6th International Conference on Learning Representations. 2018.
[5] YING Z, YOU J, MORRIS C, et al. Hierarchical graph representation learning with differentiable pooling[C] // Advances in Neural Information Processing Systems. 2018 : 4800 – 4810.
[6] XU K, LI C, TIAN Y, et al. Representation learning on graphs with jumping knowledge networks[C] // Proceeding of the 35th International Conference on Machine Learning. 2018 : 5449 – 5458.
Click to follow, and get to know the fresh technology of Huawei Cloud for the first time~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。