Yue Zhang, Jie Yang
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Named Entity Recognition (NER) | Weibo NER | F1 | 58.79 | Lattice |
| Named Entity Recognition (NER) | MSRA | F1 | 93.18 | Lattice |
| Named Entity Recognition (NER) | Resume NER | F1 | 94.46 | Lattice |
| Named Entity Recognition (NER) | OntoNotes 4 | F1 | 73.88 | Lattice |