Mandar Joshi, Omer Levy, Daniel S. Weld, Luke Zettlemoyer
We apply BERT to coreference resolution, achieving strong improvements on the OntoNotes (+3.9 F1) and GAP (+11.5 F1) benchmarks. A qualitative analysis of model predictions indicates that, compared to ELMo and BERT-base, BERT-large is particularly better at distinguishing between related but distinct entities (e.g., President and CEO). However, there is still room for improvement in modeling document-level context, conversations, and mention paraphrasing. Our code and models are publicly available.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Coreference Resolution | OntoNotes | F1 | 76.9 | BERT-large |
| Coreference Resolution | OntoNotes | F1 | 73.9 | BERT-base |
| Coreference Resolution | CoNLL 2012 | Avg F1 | 76.9 | c2f-coref + BERT-large |