网络 - AI Open Source in the Eyes of Big Coffee | Wang Jijie: Exploration and Research of Deep Graph in Artificial Intelligence - 亚马逊云开发者

On June 26, Amazon Cloud Technology Community Day was held in Shanghai. Amazon Cloud Technology Chief Developer Evangelist, Senior Data Scientist, Senior Application Scientist, and Amazon Cloud Technology Machine Learning Hero were all present to share and discuss the technology trends of AI open source and practical projects.

In the last issue, we brought Amazon Cloud Technology Chief Developer Evangelist 's wonderful sharing on building an open source machine learning ecosystem. This issue, we brought Amazon Shanghai Institute of Artificial Intelligence Senior Application Scientist Wang Minjie Amazon cloud technology in depth learning algorithm applied to the exploration and research topics in the field of view of data sharing. 161e8c9309f9e0 will also bring more guest sharing in the follow-up content of the Amazon Cloud Developer

📢 To learn more about the latest technology releases and practical innovations of Amazon Cloud Technology, please pay attention to the 2021 Amazon Cloud Technology China Summit held in Shanghai, Beijing and Shenzhen! Click on the picture to register~ The Shanghai summit has ended successfully. For more exciting content, please look forward to the Beijing and Shenzhen chapters!

Wang: Exploration and Research of Deep Graph in Artificial Intelligence

When it comes to the exploration and research of deep graphs in artificial intelligence, we must first clarify a concept - What is artificial intelligence? Jijie believes that there are two very important points to realize true artificial intelligence. first is to understand why the current artificial intelligence algorithms make mistakes, and the second is to explore the structural consistency between artificial intelligence algorithms and the human brain. Sex .

"Studies have shown that the order of Chinese characters does not necessarily affect reading." For example, after reading this sentence, you find that the characters in it are all messed up. When people understand natural language, they do not understand it in a linear way, but understand the text in blocks. And many models understand text in a linear way.

From the perspective of image recognition, if an algorithm is used to identify a picture of a dog sitting on a motorcycle, it can only be recognized that the picture itself is composed of a dog and a motorcycle, and no more structured information can be obtained. The human brain can feel the interest of the picture.

Many data in life exist in the form of graph structures, ranging from microscopic molecules to production and life. It is an extremely common requirement to complete machine learning tasks on graphs.

In recent years, how deep learning algorithm applied to the map data become the focus of attention of developers . Hence the birth of Graph Neural Networks (GNNs). The so-called graph neural network refers to a class of deep neural networks used to learn the vector representation of points, edges or the entire graph, and its core idea is message passing. For example, if you want to judge which NBA team a person likes, you can find out which team his friends like on social networks. If 80% of his friends like it, then he has a high probability of like this team too. When a point is modeled, information is collected from other adjacent points, and this process is information transfer.

Collect the information of all adjacent nodes together to make a summation, and after obtaining a weighted summation message, update the existing information of the node through the update function. This is the most basic mathematical modeling of graph neural networks.

graph neural network has a very wide range of applications in different fields.

Molecular Medicine: first to predict molecular properties. Its input data is a molecular structure map. Afterwards, through message passing modeling, the graph neural network is used to obtain the vector representation, which is input to the downstream classifier, and the properties and toxicity of chemicals can be judged. The second is the generation of drug molecules. First, a coding model is constructed, and then it is turned into a vector representation through a graph neural network, and some guidance is added to generate molecules that can meet the properties we need. The third is drug relocation. In this regard, Amazon has built a drug knowledge graph DRKG to represent the relationship between objects such as drugs, disease proteins, and compounds. Modeling this data with a graph neural network can predict connections between drug and disease protein nodes, thereby predicting potential drugs to treat novel diseases. Currently, 11 of the 41 drugs recommended by graph neural network modeling have been used in clinical practice.

Knowledge Graph: In the knowledge graph, graph neural networks can be used to complete many downstream tasks. Such as knowledge completion, task node classification, etc.

Recommendation system: mainstream recommendation system is mainly based on the interaction data between users and products. If user A buys a certain product and the system leaves a purchase record, through data analysis, if it is found that user B's purchase record is similar to user A, then there is a high probability that user B will be interested in the product purchased by user A. At present, the recommendation system based on graph neural network has achieved commercial implementation.

Computer Vision: Input the scene graph, model it through a graph neural network, and add a picture generator at the end. This scene graph can be reversed to generate better pictures.

Natural Language Processing: The structure of graphs is also ubiquitous in natural language processing. For example, TreeLSTM, the sentence itself is not a linear structure, it has a grammatical structure, and the sentence grammatical tree structure is used for training to obtain a better analysis model. In addition, the more popular now is the "Transformer" (Transformer), which is also a variant of the deep image.

Whether in academia or industry, graph neural networks have some very good implementation solutions. But there are also many problems to be solved. If the scale is getting bigger and bigger, how to model? How to extract structured data from unstructured data? This requires good tools to develop models.

Writing a graph neural network using traditional deep learning frameworks (TensorFlow/Pytorch/MXNet, etc.) is not an easy task. Message passing computation is a fine-grained computation, while the tensor programming interface needs to define coarse-grained computation, and the difference between coarse-grained and fine-grained makes the writing of graph neural networks very difficult. as a bridge for this challenge. Wang Jijie introduced DGL from three directions: programming interface design, underlying system optimization, and open source community construction.

is the programming interface design first. uses the concept of graph for programming, and the core concept is graph-based. Wang Jijie believes that developers should first understand that graphs are "first-class citizens" of graph neural networks. The so-called "first-class citizen" means that all DGL functions and NN modules can accept and return graph objects, including the core messaging API.

followed by the design optimization of the underlying system. Other graph neural network frameworks (such as PyTorch Geometric, PYG) often use gather/scatter primitives to support message passing calculations, which generate a large number of redundant message objects during the calculation process, occupying a lot of memory bandwidth. DGL uses efficient sparse operators to accelerate graph neural network, which is 2~64 times faster than PYG, and can save 6.3 times of memory, and is very friendly to huge graphs.

Finally, Wang his experience in in the 161e8c9309fb79 open source community. He mainly shared the following experience.

first, code is not the only important thing, documentation also occupies half of the open source project. Amazon has designed different levels of documentation. For beginners, there is 120 minutes to get started with DGL, just download and run, and then you can learn how to train. For advanced users, there are user guides, which cover design concepts, and DGL interface manuals, which allow users to grow from novice to expert in a step-by-step manner.

Second, the open source community needs to have a wealth of GNN model examples. The community is developing very fast. To keep up with the development of the community, GNN needs to have many different application scenarios and cover them together through the model. At present, DGL has more than 70 classic GNN model examples, covering various fields and research directions.

Third, need to focus on community interaction. Amazon has set up many community activities to organize developers to communicate with each other, such as regularly holding GNN user group sharing sessions, and inviting academics and industry-leading scholars or developers to share achievements in the GNN field. In addition, user forums, Slack, and WeChat groups also provide communication platforms for everyone through different channels.

AI Open Source in the Eyes of Big Coffee | Wang Jijie: Exploration and Research of Deep Graph in Artificial Intelligence

Wang: Exploration and Research of Deep Graph in Artificial Intelligence

亚马逊云开发者

引用和评论

准确率从 19% 提升至 95%！文本审核模型优化的三个阶段实践（上）

IPv6 支持度检测有意义吗？

镜舟科技亮相 2025 中国移动云智算大会，展示数据湖仓一体创新方案

公司网络不让访问网易云，看这个就够了