Christian Hadiwinoto, Hwee Tou Ng, Wee Chung Gan
Contextualized word representations are able to give different representations for the same word in different contexts, and they have been shown to be effective in downstream natural language processing tasks, such as question answering, named entity recognition, and sentiment analysis. However, evaluation on word sense disambiguation (WSD) in prior work shows that using contextualized word representations does not outperform the state-of-the-art approach that makes use of non-contextualized word embeddings. In this paper, we explore different strategies of integrating pre-trained contextualized word representations and our best strategy achieves accuracies exceeding the best prior published accuracies by significant margins on multiple benchmark WSD datasets. We make the source code available at https://github.com/nusnlp/contextemb-wsd.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Word Sense Disambiguation | Supervised: | SemEval 2007 | 68.1 | BERT (linear projection) |
| Word Sense Disambiguation | Supervised: | SemEval 2013 | 71.1 | BERT (linear projection) |
| Word Sense Disambiguation | Supervised: | SemEval 2015 | 76.2 | BERT (linear projection) |
| Word Sense Disambiguation | Supervised: | Senseval 2 | 75.5 | BERT (linear projection) |
| Word Sense Disambiguation | Supervised: | Senseval 3 | 73.6 | BERT (linear projection) |
| Word Sense Disambiguation | Supervised: | SemEval 2007 | 63.3 | BERT (nearest neighbour) |
| Word Sense Disambiguation | Supervised: | SemEval 2013 | 69.2 | BERT (nearest neighbour) |
| Word Sense Disambiguation | Supervised: | SemEval 2015 | 74.4 | BERT (nearest neighbour) |
| Word Sense Disambiguation | Supervised: | Senseval 2 | 73.8 | BERT (nearest neighbour) |
| Word Sense Disambiguation | Supervised: | Senseval 3 | 71.6 | BERT (nearest neighbour) |