Abstract: This article is a preliminary interpretation of the work of ACL2021 NER BERT-based Hidden Markov Model for multi-source weakly supervised named entity recognition.
This article is shared from the Huawei Cloud Community " ACL2021 NER | BERT-based Hidden Markov Model for Multi-source Weakly Supervised Named Entity Recognition ", author: JuTzungKuei.
Paper: Li Yinghao, Shetty Pranav, Liu Lucas, Zhang Chao, Song Le. BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised Named Entity Recognition[A]. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) [C]. Online: Association for Computational Linguistics, 2021, 6178–6190.
Link: https://aclanthology.org/2021.acl-long.482.pdf
Code: https://github.com/Yinghao-Li/CHMM-ALT
0, summary
- Research content: Learning NER using noise labels from multiple weakly supervised data
- Noise data: incomplete, inaccurate, contradictory
Propose a conditional hidden Markov model (CHMM: conditional hidden Markov model)
- Use BERT's contextual representation capabilities to enhance the classic HMM model
- Learn the transfer and emission probabilities of words from BERT embeddings, and infer potential true labels
Use alternate training method (CHMM-ALT) to further improve CHMM
- Fine-tune the BERT-NER model with tags derived from CHMM
- The output of BERT-NER is used as an additional weak source to train CHMM
SOTA reached on four data sets
1 Introduction
NER is the basic task of many downstream information extraction tasks: event extraction, relationship extraction, question and answer
- Supervised and need a lot of labeled data
- Many domains have knowledge sources: knowledge bases, domain dictionaries, labeling rules
- It can be used to match corpora to quickly generate large-scale noise training data from multiple angles
- Remote supervision NER: Only use the knowledge base as weak supervision, without using the complementary information of multi-source annotation
- The existing HMM method has limitations: one-hot word vector or no modeling
contribute:
- CHMM: Aggregate weak tags from multiple sources
- Alternate training method CHMM-ALT: Train CHMM and BERT-NER in turn, use each other's output for multiple loops to optimize the performance of multi-source weakly supervised NER
Four benchmark data sets get SOTA
2. Method
CHMM-ALT trains two models: the multi-source label aggregator CHMM and the BERT-NER model, which take turns as each other’s output
- Stage I: CHMM generates a denoising label y^{*(1:T)}y∗( according to K sources x_{1:K}^{(1:T)}x1:K(1:T) 1:T), fine-tune the BERT-NER model output \widetilde{y}^{(1:T)}y(1:T), as an additional source of annotation, added to the original weak label set x_{1:K+ 1}^{(1:T)} = {x_{1:K}^{(1:T)}, \widetilde{y}^{(1:T)}}x1:K+1(1:T )=(x1:K(1:T),y(1:T))
- Phase II: CHMM and BERT-NER improve each other in several rounds. In each round, CHMM is trained first, then BERT-NER is fine-tuned, and the input of the former is updated.
- CHMM improves Precision, BERT-NER improves Recall
Hidden Markov Model
- Do not understand
- Do not understand
3. Results
The nickname is extra: If you want to know more about the dry goods of AI technology, welcome to the AI area of HUAWEI CLOUD. There are currently AI programming Python and other six combat camps for everyone to learn for free.
Click to follow, and learn about the fresh technology of Huawei Cloud for the first time~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。