Practice and Exploration of Graph Neural Network Training Framework

In the long-term implementation of graph neural network, Meituan search and NLP team independently designed and developed Tulong, a graph neural network framework based on actual business scenarios, and a supporting graph learning platform, which improved the scale and iteration efficiency of the model. This article introduces the thinking and key design of model induction abstraction, basic framework, performance optimization, and upper-level tools, hoping to bring inspiration or help to students engaged in related work.

1 Introduction

There is a connection between everything. As a general data structure, graph can describe the relationship between entities well. For example, in social networks, graphs are used to represent the friend relationship between users; in e-commerce websites, graphs are used to represent click-to-buy behavior between users and products; in knowledge graph construction, graphs can also be used to represent Various relationships between entities. On the other hand, deep learning technology has achieved great success in the fields of computer vision, natural language processing, and speech processing. Deep learning technology converts diverse data such as images, text, and speech into dense vector representations, providing another way to represent data. With the increasingly powerful computing power of hardware, deep learning can learn complex and diverse correlations between data from massive data.

This makes one wonder, can deep learning be applied to a wider field, such as - graph? In fact, before the rise of deep learning, the industry has already begun to explore the technology of Graph Embedding ^[1] . The early graph embedding algorithms were mostly based on heuristic matrix decomposition and probabilistic graph models; later, more "shallow" neural network models represented by DeepWalk ^[2] and Node2vec ^[3] appeared; finally, GCN A series of research work represented by ^[4] broke through the barriers between graph signal processing and neural networks, and established the basic paradigm of the current graph neural network (GNN: Graph Neural Network) model based on the message passing mechanism.

In recent years, graph neural network has gradually become one of the research hotspots in academia ^[5] . In the industrial world, graph neural networks also have many applications in e-commerce search, recommendation, online advertising, financial risk control, traffic estimation and other fields, and have brought significant benefits.

Due to the unique sparsity of graph data (all pairs of nodes in the graph are connected by only a small number of edges), training directly using general-purpose deep learning frameworks such as TensorFlow and PyTorch tends to perform poorly. If a worker wants to do a good job, he must first sharpen his tools. Deep learning frameworks for graph neural networks have emerged as the times require: open source frameworks such as PyG (PyTorch Geometric) ^[6] and DGL (Deep Graph Library) ^[7] ^have greatly improved the training speed of graph neural networks and reduced resource consumption17 , Has active community support. Many companies have also built their own graph neural network frameworks according to their own business characteristics. The Meituan search and NLP teams have summarized practical experience in the long-term landing practice, and made a lot of optimizations in terms of training scale and performance, functional richness, and ease of use. This article first introduces the practical problems and challenges we have encountered in the past landing applications, and then introduces specific solutions.

1.1 Issues and challenges

From the perspective of industrial applications, a "easy to use" graph neural network framework has at least the following characteristics.

(1) Improve support for the current popular graph neural network model.

From the type of graph itself, graph neural network models can be divided into Homogeneous Graph, Heterogeneous Graph, Dynamic Graph and other types. From the perspective of training methods, it can be divided into full-graph message passing ^[4] and sub-graph sampling-based message passing ^[8] . From the perspective of reasoning, it can also be divided into direct inference and induction ^[9] .

In addition, downstream tasks include many domain-related end-to-end prediction tasks in addition to classical node classification, link prediction and graph classification. In practical applications, different business scenarios have different requirements for the graph neural network model and downstream tasks, and need to be customized. For example, in a food recommendation scenario, there are nodes such as users, merchants, and dishes, and homogeneous graphs or heterogeneous graphs can be used to describe their mutual relationships; in order to describe users' preferences at different times, a dynamic graph model may also be used; for recommendation systems The two stages of recall and sorting also need to design different training tasks. Although existing frameworks provide implementations of common models, simply invoking these models cannot meet the above requirements. At this point, users need to develop their own model and training process code, which brings additional workload. How to help users implement custom models more conveniently is a big challenge.

(2) Support model training on large-scale graphs at a reasonable cost.

In business landing applications, the scale of the graph is often very large, which can reach billions or even tens of billions of edges. In our initial attempts, we found that using the existing framework, we can only train models with a scale of tens of billions of edges in a distributed environment, which consumes a lot of hardware resources (thousands of CPUs and terabytes of memory). We hope that a single machine can train a model with a scale of tens of billions of edges in a reasonable time, thereby reducing the demand for hardware resources.

(3) Seamless connection with the business system.

The complete implementation process of the graph neural network includes at least: business data composition, offline training and evaluation model, online reasoning, business index observation and other steps. To successfully implement graph neural network technology, it is necessary to fully understand business logic and business requirements, and manage business scenarios in a unified and efficient manner. Also taking the food recommendation scenario as an example, the online log records behavioral events such as exposure, click, and ordering. The knowledge graph provides rich attribute data of merchants and dishes. How to construct a graph from these heterogeneous data needs to be combined with the actual business situation. confirmed by the experiment. Appropriate tools can improve the efficiency of docking business data. However, most of the existing graph neural network frameworks focus on offline training and evaluation of models, lacking such tools.

(4) It is easy for R&D personnel to get started, while providing sufficient scalability.

From the perspective of R&D efficiency, the purpose of the self-built graph neural network framework is to reduce the duplication of work in modeling, so that the R&D personnel can focus on the characteristics of the business itself. Therefore, a "good to use" graph neural network framework should be easy to use, and can complete most tasks with simple configuration. On this basis, appropriate support can also be provided for some special modeling requirements.

1.2 Meituan's solution

The Meituan search and NLP team have summarized practical experience in the long-term implementation of search, recommendation, advertising, distribution and other businesses, and independently designed and developed the graph neural network framework Tulong and the supporting graph learning platform, which better solve the above problems.

First, we conduct a fine-grained analysis of the current popular graph neural network models, summarize a series of sub-operations, and implement a general model framework. Many existing graph neural network models can be implemented with simple configuration modifications.
For the training method based on subgraph sampling, we developed the graph computing library "MTGraph", which greatly optimizes the memory usage of graph data and the subgraph sampling speed. In a stand-alone environment, the training speed is about 4 times higher than that of DGL, and the memory usage is reduced by about 60%. A single machine can achieve training on the scale of one billion nodes and ten billion edges.
Focusing on the graph neural network framework Tulong, we have built a one-stop graph learning platform to provide developers with graphical tools for the entire process, including business data access, graph data construction and management, model training and evaluation, and model export and launch. .
Tulong implements highly configurable training and evaluation, from parameter initialization to learning rate, from model structure to loss function type, can be controlled by a set of configuration files. For common scenarios of business applications, we have summarized several training templates, and developers can adapt to most business scenarios by modifying the configuration. For example, many businesses have periodic fluctuations in the afternoon and evening peaks, for which we designed a training template for periodic dynamic graphs, which can generate different GNN representations for different periods of the day. In the application of Meituan's delivery business, GNN representations at different time periods need to be generated for each region as input features for downstream prediction tasks. During the development process, it only took three days from starting to modify the configuration to producing the first version of the model; before that, it took about two weeks to implement a similar model solution by yourself.

2. System overview

As shown in Figure 1 below, Tulong's supporting graph computing library and graph learning platform constitute a complete system. The system can be divided into the following 3 components from bottom to top.

图1 图神经网络计算引擎、框架和平台的系统架构

(1) Diagram and deep learning engine

We divide the underlying operators of graph neural networks into three categories: graph structure query, sparse tensor computation, and dense tensor computation. We developed the graph computing library MTGraph to provide graph data storage and query functions, and deeply optimize memory usage and subgraph sampling speed. MTGraph is compatible with PyTorch and DGL, and users can directly write DGL-based model codes on the basis of MTGraph.

(2) Tulong frame

The Tulong framework first encapsulates and implements the basic components required for training graph neural networks, including graph and feature data preprocessing, subgraph samplers, general GNN model framework, and basic tasks including training and evaluation. Based on the above components, the Tulong framework provides rich predefined models and training/inference processes. Users can train and evaluate GNN models on business data by modifying configuration files.

(3) Graph Learning Platform

The graph learning platform is designed to simplify the offline model development and iteration process, while simplifying the docking process of business systems. The graph learning platform provides a series of visualization tools to simplify the whole process from business data access to model launch.

The following sections will introduce the analysis and design of each module in detail from four aspects: model framework, training process framework, performance optimization, and graph learning platform.

3. Model Framework

From the perspective of engineering implementation, we summarize the basic paradigm of the current mainstream graph neural network model, and implement a general framework to cover a variety of GNN models. The following are discussed separately according to the type of graph (homogeneous graph, heterogeneous graph and dynamic graph).

3.1 Homogeneous graph

3.2 Heterogeneous graph

Compared with the homogeneous graph, the heterogeneous graph (Heterogeneous Graph) expands the node type and edge type. For example, the academic citation network ^[13] contains nodes of types such as papers, authors, institutions, etc., and the nodes are directly connected by edges of types such as “papers citing other papers”, “authors write papers”, and “authors belong to institutions”, as shown in Figure 2 below. shown:

图2 同质图与异质图的比较

We treat a heterogeneous graph as a superposition of multiple bipartite graphs, each corresponding to an edge type. The above academic citation network can be expressed as "paper-citing-paper", "author-writing-paper", "author-belonging-institution", a total of three bipartite graphs, and the GNN model framework of the homogeneous graph can be slightly modified. Applied on bipartite graphs.

图3 异质图模型框架

3.3 Dynamic graph

图5 连续时间动态图GNN模型框架

The computing paradigms of homogeneous graphs, heterogeneous graphs and dynamic graphs are analyzed above. We extract general functions (operators) from them, including message functions, aggregation functions, update functions, and neighbor node functions, and give a variety of predefined functions. realization. Framework users can assemble combinatorial operators by configuring options to implement the required GNN model.

4. Training process framework

Training a GNN model usually includes processes such as loading data, defining the GNN model, training and evaluating, and exporting the model. Due to the diversity of GNN models and training tasks, in the actual development process, users often have to write their own models and process codes for their own scenarios, and dealing with tedious underlying details makes it difficult for users to focus on the tuning of the algorithm model itself. GraphGym ^[12] and DGL-Go ^[16] try to solve this problem by integrating multiple models and training tasks, while simplifying the interface, allowing users to get started and train GNN models more directly.

We solve this problem in a more "industrial" way (as shown in Figure 6 below), the framework is divided into two layers: basic components and process components. The basic components focus on a single function. For example, the graph data component only maintains the graph data structure in memory, and does not provide on-graph sampling or tensor computing functions; the on-graph sampling function is provided by a graph sampler. The process component provides a relatively complete data preprocessing, training and evaluation process by assembling basic components. For example, the training process combines components such as graph data, graph sampler, and GNN model to provide complete training functions.

图6 训练流程框架

Going a step further, we provide a variety of process configuration templates and GNN model templates. The template exposes several hyperparameters, such as training data path, model type, learning rate, and other parameters. Combined with the hyperparameters specified by the user, a training task can be completely defined. In other words, a GNN model experiment can be completely reproduced based on templates and parameters. The framework will parse these configurations and generate executable applications.

For example, users can select the configuration template of the GraphSage model and the training template of the link prediction task, specify the number of layers and dimensions of the model, and the training and evaluation data path, and then start training the GraphSage-based link prediction model.

5. Performance optimization

With the development of the business, the scale of the business scenario picture below also becomes larger and larger. How to efficiently train GNN models with billions or even tens of billions of edges at a reasonable cost has become an urgent problem to be solved. We solve this problem by optimizing the memory footprint of a single machine and optimizing the subgraph sampling algorithm.

5.1 Graph data structure optimization

The memory footprint of graph data structures is an important factor limiting the size of trainable graphs. Taking the MAG240M-LSC dataset ^[13] as an example, there are 240 million nodes and 3.5 billion edges in the graph after adding reverse edges. In the training method based on subgraph sampling, the graph data structures of PyG and DGL stand-alone machines need to occupy more than 100GB of memory, and other open source frameworks often occupy more memory. On a larger-scale business scenario graph, the memory footprint often exceeds the hardware configuration. We designed and implemented a more compact graph data structure, increasing the graph scale that can be carried by a single machine.

We use graph compression technology to reduce memory usage. Different from conventional graph compression problems, random query operations need to be supported in GNN scenarios. For example, query the neighbor nodes of a given node; determine whether two given nodes are connected in the graph. Our proposed solution to this consists of two parts:

Graph data preprocessing and compression : Firstly, the statistical characteristics of the graph are analyzed, and nodes are clustered and renumbered in a lightweight way, so that nodes with similar numbers are more similar in domain structure. Then adjust the order of the edges, block and encode the edge data, and generate a graph data file at the level of "node-block index-adjacent edge" (as shown in Figure 7 below). Finally, if the data contains node features or edge features, it is also necessary to align the features with the compressed graph.

图7 压缩后的图数据结构

Random query of graph: The query operation is divided into two steps: first locate the required edge data block, then decompress the data block in memory, and read the queried data. For example, when querying whether nodes $u$ and $v$ are connected, first calculate the address of the edge data block according to the numbers of the two nodes, decompress the data block to obtain a small number of candidate adjacent edges (usually no more than 16), and then find out whether Contains the edge $(u,v)$.

After compression, only 15GB of memory is required to load the MAG240M-LSC dataset. The memory usage of tens or even 100 billion edge scale graphs is significantly reduced, reaching a level that can be carried by a single machine, as shown in Figure 8 below:

图8 图数据结构内存占用对比

5.2 Subgraph Sampling Optimization

Subgraph sampling is one of the performance bottlenecks in GNN model training. We found that in some business graphs, the time-consuming of subgraph sampling even accounts for more than 80% of the overall training. We design and implement a variety of efficient neighbor node sampling algorithms for static graphs and dynamic graphs respectively. The main optimization methods include:

Random number generator : Compared with applications such as communication encryption, the sampling on the graph does not have strict requirements on the "randomness" of the random number generator. We appropriately relax the requirements for randomness, and design and implement a faster random number generator, which can be directly applied to sampling operations with and without replacement.
Probabilistic quantization : In weighted sampling, quantizes the probability value represented by a floating point number into a more compact integer with an acceptable loss of precision. Not only reduces the memory consumption of the sampler, but also converts some floating point operations into integer operations.
Timestamp indexing : The subgraph sampling operation of dynamic graphs requires bounding the time range of the edges. The sampler first builds an index on the timestamps on the edge, and determines the range of edges that can be sampled according to the index during sampling, and then performs the actual sampling operation.

After the above optimization, the sub-image sampling speed has been improved by 2 to 4 times compared with DGL (as shown in Figure 9 below). A business scenario graph A (200 million nodes and 4 billion edges) takes 2.5 hours/epoch to train using DGL, and can be optimized to 0.5 hours/epoch. A business scenario graph B (250 million nodes and 12.4 billion edges) can only be trained in a distributed manner, which takes 6 hours/epoch; after optimization, it can be trained on a single machine, and the speed can reach 2 hours/epoch.

图9 子图采样速度对比（2层，每层20条邻接边）

6. Graph Learning Platform

The graph learning platform is designed to simplify the iterative process of offline model development, while simplifying the docking process of business systems. A complete iterative process of model development includes at least three stages: preparing the dataset, defining the model and training tasks, and training and evaluating the model. We analyze the needs of users in these three stages and provide corresponding tools to improve development efficiency:

Data set management : Constructing graphs from business data is the first step in model development. The graph learning platform provides Spark-based graph composition functions, which can convert business data stored in Hive into Tulong's custom graph data format. Business data is often stored in the form of event logs, and there are plenty of options for how to abstract graphs from them. For example, in the recommendation scenario, the business log contains the user's click and order records on the merchant. In addition to depicting the "user-click-merchant" event as a graph, it is also possible to describe the relationship between jointly clicking on the merchant in a short period of time. In addition, additional data can also be introduced, such as the location of the merchant, the dishes that the merchant is selling, and so on. Exactly which composition scheme to use requires experimentation to determine. In this regard, the graph learning platform provides a graphical composition tool (as shown in Figure 10 below) to help users sort out composition schemes; at the same time, it also provides version management of graph datasets to facilitate comparison of the effects of different composition schemes.

图10 图形化的构图工具

Experiment management : After the graph data is determined, the modeling scheme and training strategy are the keys to affect the final effect. For example, what GNN model should be used? How to choose the loss function? How are model hyperparameters and training hyperparameters determined? These questions also require extensive experimentation to answer. Based on the Tulong framework, the modeling scheme and training strategy can be controlled by a set of configurations. The graph learning platform provides a visual editor and version management functions for configuration, which is convenient for comparing the pros and cons of different schemes.
Process management : Once you have graph datasets and modeling/training schemes, you also need to automate the entire process. This is a necessary condition for the model to go live, and it also helps team members reproduce each other's solutions. The graph learning platform provides automated scheduling for the common process of "composition, training, evaluation, and export", and the results of the previous stage can be reused when appropriate to improve efficiency. For example, if the definition of the dataset has not changed, you can skip the Spark composition stage and use the existing graph data directly. In addition, the platform provides functions such as composition and modeling scheme integration and timing scheduling to meet the needs of model online.

7. Summary

This article introduces the practical experience of Meituan Search and NLP team in the construction of graph neural network framework, including the thinking and key design of GNN model induction and abstraction, basic framework, performance optimization, and upper-level tools. The design idea of the framework comes from the practical problems encountered in business implementation, such as optimization of large-scale graphs, process management in multi-person collaboration, etc. At the same time, it also draws on the latest research progress in academia, such as the computing paradigm of dynamic graphs Wait. In addition to the optimization at the technical level, the construction of the framework also benefits from the close cooperation between the engineering team and the algorithm team, and the project can progress smoothly based on common and in-depth cognition.

With the help of the Tulong framework, graph neural network technology has been applied in multiple business scenarios of Meituan's search, recommendation, advertising, and distribution, and has achieved considerable business benefits. We believe that the graph neural network has broader application prospects, and the graph neural network framework as an infrastructure is also worthy of continuous optimization and improvement.

8. About the author

Fu Hao, Xianpeng, Xiangzhou, Yuji, Xu Hao, Mengdi, Wuwei, etc. are all from the Meituan platform/search and NLP department.

9. References

[1] Cai, Hongyun, Vincent W. Zheng, and Kevin Chen-Chuan Chang. "A comprehensive survey of graph embedding: Problems, techniques, and applications." IEEE Transactions on Knowledge and Data Engineering 30, no. 9 (2018) : 1616-1637.
[2] Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. "Deepwalk: Online learning of social representations." In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701-710. 2014.
[3] Grover, Aditya, and Jure Leskovec. "Node2vec: Scalable feature learning for networks." In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855-864. 2016.
[4] Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." International Conference on Learning Representations (2017).
[5] Wu, Zonghan, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S. Yu Philip. "A comprehensive survey on graph neural networks." IEEE transactions on neural networks and learning systems 32, no. 1 (2020 ): 4-24.
[6] https://github.com/pyg-team/pytorch_geometric
[7] https://www.dgl.ai/
[8] Chen, Jie, Tengfei Ma, and Cao Xiao. "FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling." In International Conference on Learning Representations (2018).
[9] Hamilton, Will, Zhitao Ying, and Jure Leskovec. "Inductive representation learning on large graphs." Advances in neural information processing systems 30 (2017).
[10] Xu, Keyulu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and Stefanie Jegelka. "Representation learning on graphs with jumping knowledge networks." In International Conference on Machine Learning, pp. 5453-5462. PMLR , 2018.
[11] Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9, no. 8 (1997): 1735-1780.
[12] https://github.com/snap-stanford/GraphGym
[13] https://ogb.stanford.edu/
[14] Sankar, Aravind, Yanhong Wu, Liang Gou, Wei Zhang, and Hao Yang. "Dysat: Deep neural representation learning on dynamic graphs via self-attention networks." In Proceedings of the 13th International Conference on Web Search and Data Mining , pp. 519-527. 2020.
[15] Xu, Da, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, and Kannan Achan. "Inductive representation learning on temporal graphs." International Conference on Learning Representations (2020).
[16] https://github.com/dmlc/dgl/tree/master/dglgo
[17] Wang, Minjie, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou et al. "Deep graph library: A graph-centric, highly-performant package for graph neural networks." arXiv preprint arXiv : 1909.01315 (2019).
[18] Fey, M. and Lenssen, JE "Fast graph representation learning with PyTorch Geometric." In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
[19] Schlichtkrull, Michael, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. "Modeling relational data with graph convolutional networks." In European semantic web conference, pp. 593-607. Springer , Cham, 2018.

Job Offers

Meituan Search and NLP Department/NLP Center is the core team responsible for the research and development of Meituan artificial intelligence technology. Its mission is to build a world-class natural language processing core technology and service capabilities, relying on NLP (natural language processing), Deep Learning (deep learning) , Knowledge Graph (Knowledge Graph) and other technologies to process the massive text data of Meituan and provide intelligent text semantic understanding services for Meituan's various businesses. The NLP Center has been recruiting natural language processing algorithm experts/machine learning algorithm experts for a long time. Interested students can send their resumes to: tech@meituan.com (email subject: Meituan Search and NLP Department).

Read more collections of technical articles from the Meituan technical team

| Reply keywords such as [2021 stock], [2020 stock], [2019 stock], [2018 stock], [2017 stock] in the public account menu bar dialog box, you can view the collection of technical articles by the Meituan technical team over the years.

| This article is produced by Meituan technical team, and the copyright belongs to Meituan. Welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication, please indicate "The content is reproduced from the Meituan technical team". This article may not be reproduced or used commercially without permission. For any commercial activities, please send an email to tech@meituan.com to apply for authorization.

Practice and Exploration of Graph Neural Network Training Framework

1 Introduction

1.1 Issues and challenges

1.2 Meituan's solution

2. System overview

3. Model Framework

3.1 Homogeneous graph

3.2 Heterogeneous graph

3.3 Dynamic graph

4. Training process framework

5. Performance optimization

5.1 Graph data structure optimization

5.2 Subgraph Sampling Optimization

6. Graph Learning Platform

7. Summary

8. About the author

9. References

Job Offers

美团技术团队

引用和评论

可信实验白皮书系列04：随机轮转实验

2025年医疗大模型各医疗场景赋能实践研究报告130+份汇总解读|附PDF下载

科学计算编程涉及到的技术栈简介

manus 的替代品有哪些？使用LLM大模型技术做手机/网页/浏览器自动化操作技术汇总

vLLM 实战教程汇总，从环境配置到大模型部署，中文文档追踪重磅更新

性能远超SAM系模型，苏黎世大学等开发通用3D血管分割基础模型

【vLLM 学习】基础教程