Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders

Fangyu Liu, Ivan Vulić, Anna Korhonen, Nigel Collier

2021-04-16EMNLP 2021 11Cross-Lingual Semantic Textual Similarity Sentence Similarity Entity Linking Semantic Similarity Semantic Textual Similarity Contrastive Learning STS

Paper PDF Code(official)

Abstract

Pretrained Masked Language Models (MLMs) have revolutionised NLP in recent years. However, previous work has indicated that off-the-shelf MLMs are not effective as universal lexical or sentence encoders without further task-specific fine-tuning on NLI, sentence similarity, or paraphrasing tasks using annotated task data. In this work, we demonstrate that it is possible to turn MLMs into effective universal lexical and sentence encoders even without any additional data and without any supervision. We propose an extremely simple, fast and effective contrastive learning technique, termed Mirror-BERT, which converts MLMs (e.g., BERT and RoBERTa) into such encoders in 20-30 seconds without any additional external knowledge. Mirror-BERT relies on fully identical or slightly modified string pairs as positive (i.e., synonymous) fine-tuning examples, and aims to maximise their similarity during identity fine-tuning. We report huge gains over off-the-shelf MLMs with Mirror-BERT in both lexical-level and sentence-level tasks, across different domains and different languages. Notably, in the standard sentence semantic similarity (STS) tasks, our self-supervised Mirror-BERT model even matches the performance of the task-tuned Sentence-BERT models from prior work. Finally, we delve deeper into the inner workings of MLMs, and suggest some evidence on why this simple approach can yield effective universal lexical and sentence encoders.

Results

Task	Dataset	Metric	Value	Model
Semantic Textual Similarity	STS14	Spearman Correlation	0.732	Mirror-RoBERTa-base (unsup.)
Semantic Textual Similarity	STS14	Spearman Correlation	0.713	Mirror-BERT-base (unsup.)
Semantic Textual Similarity	STS15	Spearman Correlation	0.814	Mirror-BERT-base (unsup.)
Semantic Textual Similarity	STS15	Spearman Correlation	0.798	Mirror-RoBERTa-base (unsup.)
Semantic Textual Similarity	SICK	Spearman Correlation	0.706	Mirror-RoBERTa-base (unsup.)
Semantic Textual Similarity	SICK	Spearman Correlation	0.703	Mirror-BERT-base (unsup.)
Semantic Textual Similarity	STS13	Spearman Correlation	0.819	Mirror-RoBERTa-base (unsup.)
Semantic Textual Similarity	STS13	Spearman Correlation	0.796	Mirror-BERT-base (unsup.)
Semantic Textual Similarity	STS Benchmark	Spearman Correlation	0.787	Mirror-RoBERTa-base (unsup.)
Semantic Textual Similarity	STS Benchmark	Spearman Correlation	0.764	Mirror-BERT-base (unsup.)
Semantic Textual Similarity	STS12	Spearman Correlation	0.674	Mirror-BERT-base (unsup.)
Semantic Textual Similarity	STS12	Spearman Correlation	0.648	Mirror-RoBERTa-base (unsup.)
Semantic Textual Similarity	STS16	Spearman Correlation	0.78	Mirror-RoBERTa-base (unsup.)
Semantic Textual Similarity	STS16	Spearman Correlation	0.743	Mirror-BERT-base (unsup.)

Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders

Abstract

Results

Related Papers

Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders

Abstract

Results

Related Papers