Paper Interpretation丨The application of graph neural network to semi-structured document named entity recognition and relationship extraction

Abstract: management documents for transferring and recording business information, a method that can automatically extract and understand content from these documents robustly and efficiently has become an urgent need. This interpretation of the article proposes to use graph neural network to solve the problem of entity recognition (NER) and relationship extraction in semi-structured documents.

This article is shared from HUAWEI CLOUD COMMUNITY " Paper Interpretation Series 11: Graph Neural Network Applied to Semi-structured Document Named Entity Recognition and Relationship Extraction ", the original author: Xiao Cainiao chg.

Summary:

With the widespread use of management documents for transmitting and recording business information, a method that can automatically extract and understand content from these documents robustly and efficiently has become an urgent need. In addition, the graph-based expression method has flexible adaptability to the changes of different document templates, so that the graph expression method fits well with the semi-structured characteristics of these management documents. Because the graph neural network (GNN) can learn the relationship between the data elements in the document well, the article in this interpretation proposes to use the graph neural network to solve the entity recognition (NER) and relationship extraction problems in semi-structured documents . Experiments have verified that the method proposed in this article has achieved SOTA results on the three tasks of word grouping, entity classification, and relationship prediction. At the same time, it has two completely different types of data sets, FUNSD (form comprehension) and IEHHR (handwritten marriage file comprehension). The experimental results obtained further verify the generalization of the method proposed in this interpretation article.

1. Method

GNN is widely used in tasks such as NER and table extraction. On this basis, the article of this interpretation proposes to apply GNN to the task of extracting key-value pairs. It not only classifies the entities in the document image, but also Predict the relationship between entities.

Given an input document, the tasks that the model needs to complete include: (a) word grouping: detecting document entities, that is, grouping words with the same semantics; (b) entity classification: dividing the detected entities into preset categories; ( c) Relationship prediction: Discover the matching relationship between entities.

(1) The structure of the graph

This interpretation of the article proposes to construct two images to represent documents, and on this basis, train three different models to solve the corresponding tasks: word grouping f_1f1, entity classification f_2f2, relationship prediction f_3f3. As shown in Figure 1, the document will be represented as a graph constructed from the OCR result G_1=(V_1,E_1)G1=(V1,E1), where V_1V1 is a set of nodes composed of each word in the OCR result ; Perform kk-nearest neighbor (take k=10k=10) on the distance between the upper left corner of each word text box to generate edge E_1E1, and calculate scores for each edge s=f_1 (G_1)s=f1(G1), Filter out edges greater than the threshold \tauτ (FUNSD is set to 0.65, IEHHR is set to 0.9) to get the result of word grouping.

Figure 1 Diagram structure diagram of word grouping

Figure 2 Schematic diagram of the graph structure structure of entity classification and relationship prediction

As shown in Figure 2, after the entities (that is, groups of words) are obtained on the basis of G_1G1, a graph G_2=(V_2,E_2)G2=(V2,E2) is obtained from each entity, where V_2V2 Represents the set of entities filtered by G_1G1, and E_2E2 is the set of edges obtained by the full connections between the entity nodes. The entity classification result is obtained by c=f_2 (G_2)c=f2(G2); the relationship prediction result is obtained by s=f_3 (G_3)s=f3(G3).

(2) Graph calculation

2. Experimental results

The experimental results of FUNSD show that the method proposed in this interpretation article has room for optimization compared with LayoutLM. The reason may be that the data volume of FUNSD is small. The experimental results of IEHHR show that this method also has a certain effect in other fields of form recognition, namely, comprehension of handwritten records, reflecting its generalization.

Click to follow, and learn about Huawei Cloud's fresh technology for the first time~

Paper Interpretation丨The application of graph neural network to semi-structured document named entity recognition and relationship extraction

Summary:

1. Method

(1) The structure of the graph

(2) Graph calculation

2. Experimental results

华为云开发者联盟

引用和评论

华为云开发者联盟入选 2023 中国技术品牌影响力企业榜，深耕开发者生态

30分钟内输出结果，新加坡国立大学/MIT等基于SVM构建微生物污染检测模型

性能远超SAM系模型，苏黎世大学等开发通用3D血管分割基础模型

登Nature子刊，俄罗斯研究团队基于机器学习实现万亿级质谱数据搜索，发现未知化学反应

英伟达新一代GPU架构（50系列显卡）PyTorch兼容性解决方案

PyTorch PINN实战：用深度学习求解微分方程

10招立竿见影的PyTorch性能优化技巧，让模型训练速度翻倍