Abstract: management documents for transferring and recording business information, a method that can automatically extract and understand content from these documents robustly and efficiently has become an urgent need. This interpretation of the article proposes to use graph neural network to solve the problem of entity recognition (NER) and relationship extraction in semi-structured documents.

This article is shared from HUAWEI CLOUD COMMUNITY " Paper Interpretation Series 11: Graph Neural Network Applied to Semi-structured Document Named Entity Recognition and Relationship Extraction ", the original author: Xiao Cainiao chg.
image.png

Summary:

With the widespread use of management documents for transmitting and recording business information, a method that can automatically extract and understand content from these documents robustly and efficiently has become an urgent need. In addition, the graph-based expression method has flexible adaptability to the changes of different document templates, so that the graph expression method fits well with the semi-structured characteristics of these management documents. Because the graph neural network (GNN) can learn the relationship between the data elements in the document well, the article in this interpretation proposes to use the graph neural network to solve the entity recognition (NER) and relationship extraction problems in semi-structured documents . Experiments have verified that the method proposed in this article has achieved SOTA results on the three tasks of word grouping, entity classification, and relationship prediction. At the same time, it has two completely different types of data sets, FUNSD (form comprehension) and IEHHR (handwritten marriage file comprehension). The experimental results obtained further verify the generalization of the method proposed in this interpretation article.

1. Method

GNN is widely used in tasks such as NER and table extraction. On this basis, the article of this interpretation proposes to apply GNN to the task of extracting key-value pairs. It not only classifies the entities in the document image, but also Predict the relationship between entities.

Given an input document, the tasks that the model needs to complete include: (a) word grouping: detecting document entities, that is, grouping words with the same semantics; (b) entity classification: dividing the detected entities into preset categories; ( c) Relationship prediction: Discover the matching relationship between entities.

(1) The structure of the graph

This interpretation of the article proposes to construct two images to represent documents, and on this basis, train three different models to solve the corresponding tasks: word grouping f_1f1​, entity classification f_2f2​, relationship prediction f_3f3​. As shown in Figure 1, the document will be represented as a graph constructed from the OCR result G_1=(V_1,E_1)G1​=(V1​,E1​), where V_1V1​ is a set of nodes composed of each word in the OCR result ; Perform kk-nearest neighbor (take k=10k=10) on the distance between the upper left corner of each word text box to generate edge E_1E1​, and calculate scores for each edge s=f_1 (G_1)s=f1​(G1​), Filter out edges greater than the threshold \tauτ (FUNSD is set to 0.65, IEHHR is set to 0.9) to get the result of word grouping.
image.png

Figure 1 Diagram structure diagram of word grouping
image.png

Figure 2 Schematic diagram of the graph structure structure of entity classification and relationship prediction

As shown in Figure 2, after the entities (that is, groups of words) are obtained on the basis of G_1G1​, a graph G_2=(V_2,E_2)G2​=(V2​,E2​) is obtained from each entity, where V_2V2​ Represents the set of entities filtered by G_1G1, and E_2E2​ is the set of edges obtained by the full connections between the entity nodes. The entity classification result is obtained by c=f_2 (G_2)c=f2​(G2​); the relationship prediction result is obtained by s=f_3 (G_3)s=f3​(G3​).

(2) Graph calculation

image.png

2. Experimental results

image.png

The experimental results of FUNSD show that the method proposed in this interpretation article has room for optimization compared with LayoutLM. The reason may be that the data volume of FUNSD is small. The experimental results of IEHHR show that this method also has a certain effect in other fields of form recognition, namely, comprehension of handwritten records, reflecting its generalization.

Click to follow, and learn about Huawei Cloud's fresh technology for the first time~


华为云开发者联盟
1.4k 声望1.8k 粉丝

生于云,长于云,让开发者成为决定性力量