Jana Straková, Milan Straka, Jan Hajič
We propose two neural network architectures for nested named entity recognition (NER), a setting in which named entities may overlap and also be labeled with more than one label. We encode the nested labels using a linearized scheme. In our first proposed approach, the nested labels are modeled as multilabels corresponding to the Cartesian product of the nested labels in a standard LSTM-CRF architecture. In the second one, the nested NER is viewed as a sequence-to-sequence problem, in which the input sequence consists of the tokens and output sequence of the labels, using hard attention on the word whose label is being predicted. The proposed methods outperform the nested NER state of the art on four corpora: ACE-2004, ACE-2005, GENIA and Czech CNEC. We also enrich our architectures with the recently published contextual embeddings: ELMo, BERT and Flair, reaching further improvements for the four nested entity corpora. In addition, we report flat NER state-of-the-art results for CoNLL-2002 Dutch and Spanish and for CoNLL-2003 English.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Named Entity Recognition (NER) | ACE 2004 | F1 | 84.4 | seq2seq+BERT+Flair |
| Named Entity Recognition (NER) | ACE 2005 | F1 | 84.33 | seq2seq+BERT+Flair |
| Named Entity Recognition (NER) | CoNLL 2003 (German) | F1 | 85.1 | Straková et al., 2019 |
| Named Entity Recognition (NER) | CoNLL 2003 (English) | F1 | 93.38 | LSTM-CRF+ELMo+BERT+Flair |
| Named Entity Recognition (NER) | CoNLL 2002 (Spanish) | F1 | 88.8 | Straková et al., 2019 |
| Named Entity Recognition (NER) | CoNLL 2002 (Dutch) | F1 | 92.7 | Straková et al., 2019 |
| Named Entity Recognition (NER) | GENIA | F1 | 78.31 | seq2seq+BERT+Flair |
| Named Entity Recognition (NER) | ACE 2005 | F1 | 84.33 | seq2seq+BERT+Flair |
| Named Entity Recognition (NER) | ACE 2004 | F1 | 84.4 | seq2seq+BERT+Flair |
| Named Entity Recognition (NER) | GENIA | F1 | 78.31 | seq2seq+BERT+Flair |
| Nested Mention Recognition | ACE 2005 | F1 | 84.33 | seq2seq+BERT+Flair |
| Nested Mention Recognition | ACE 2004 | F1 | 84.4 | seq2seq+BERT+Flair |