Few-shot Learning for Named Entity Recognition in Medical Text笔记

1. Summary

本文对Electronic health records的一些数据集进行了命名实体识别研究。在利用其它相关数据集的基础上，对target dataset只采集10个样例进行few-shot learning，提出了五种提升性能的方法（tricks）：
（1）layer-wise initialization with pre-trained weights
（2）hyperparameter tuning
（3）combining pre-training data
（4）custom word embeddings
（5）optimizing out-of-vocabulary (OOV) words

2. Content

本文所用数据集如下，主要是医学领域数据集+CoNLL-2003英语新闻专线数据集。

文章使用的baseline model是J. Chiu et al.提出的BLSTM-CNNs，亮点是拼接了character、word和casing embedding，其中casing embedding主要包括numeric, allLower, allUpper, mainly_numeric (more than 50% of characters of a word are numeric), initialUpper, contains_digit, padding and other。

5种提升性能的tricks如下：
（1）Single pre-training：使用其它单个数据集分别预训练，并设置了对比实验：所有层使用预训练权重、仅BLSTM使用、所有层除BLSTM、不使用预训练权重。
（2）Hyperparameter tuning：包括optimizers、pre-training dataset、SGD learning rate、batch normalization(是否使用)、word embedding（是否trainable）以及learning rate decay (constant or time scheduled)。
（3）Combined pre-training：利用多个数据集串联预训练模型，并在目标数据集训练时加载权重。
（4）Customized word embeddings：word embedding是否使用GloVE或者在医药数据集上重新用FastText训练。
（5）Optimizing OOV words：Remove trailing “:”, “;”, “.” and “-”、Remove quotations、Remove leading “+”

五种优化方法结果如下:
（1）Single pre-training：F1-score提升+4.52%。
（2）Hyperparameter tuning：优化器选择最重要（NAdam>>SGD）, 第二重要的是预训练数据集的选择（+2.34%）。
（3）Combined pre-training：多数据串联预训练，负作用-1.85%。
（4）Customizing word embeddings：自训练word embedding提升+3.78%。
（5）Optimizing OOV words：提升+0.87%。

Few-shot Learning for Named Entity Recognition in Medical Text笔记

1. Summary

2. Content

Mecthew

引用和评论

A Frustratingly Easy Approach论文简记