TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Fast, Effective, and Self-Supervised: Transforming Masked ...

Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders

Fangyu Liu, Ivan Vulić, Anna Korhonen, Nigel Collier

2021-04-16EMNLP 2021 11Cross-Lingual Semantic Textual SimilaritySentence SimilarityEntity LinkingSemantic SimilaritySemantic Textual SimilarityContrastive LearningSTS
PaperPDFCode(official)

Abstract

Pretrained Masked Language Models (MLMs) have revolutionised NLP in recent years. However, previous work has indicated that off-the-shelf MLMs are not effective as universal lexical or sentence encoders without further task-specific fine-tuning on NLI, sentence similarity, or paraphrasing tasks using annotated task data. In this work, we demonstrate that it is possible to turn MLMs into effective universal lexical and sentence encoders even without any additional data and without any supervision. We propose an extremely simple, fast and effective contrastive learning technique, termed Mirror-BERT, which converts MLMs (e.g., BERT and RoBERTa) into such encoders in 20-30 seconds without any additional external knowledge. Mirror-BERT relies on fully identical or slightly modified string pairs as positive (i.e., synonymous) fine-tuning examples, and aims to maximise their similarity during identity fine-tuning. We report huge gains over off-the-shelf MLMs with Mirror-BERT in both lexical-level and sentence-level tasks, across different domains and different languages. Notably, in the standard sentence semantic similarity (STS) tasks, our self-supervised Mirror-BERT model even matches the performance of the task-tuned Sentence-BERT models from prior work. Finally, we delve deeper into the inner workings of MLMs, and suggest some evidence on why this simple approach can yield effective universal lexical and sentence encoders.

Results

TaskDatasetMetricValueModel
Semantic Textual SimilaritySTS14Spearman Correlation0.732Mirror-RoBERTa-base (unsup.)
Semantic Textual SimilaritySTS14Spearman Correlation0.713Mirror-BERT-base (unsup.)
Semantic Textual SimilaritySTS15Spearman Correlation0.814Mirror-BERT-base (unsup.)
Semantic Textual SimilaritySTS15Spearman Correlation0.798Mirror-RoBERTa-base (unsup.)
Semantic Textual SimilaritySICKSpearman Correlation0.706Mirror-RoBERTa-base (unsup.)
Semantic Textual SimilaritySICKSpearman Correlation0.703Mirror-BERT-base (unsup.)
Semantic Textual SimilaritySTS13Spearman Correlation0.819Mirror-RoBERTa-base (unsup.)
Semantic Textual SimilaritySTS13Spearman Correlation0.796Mirror-BERT-base (unsup.)
Semantic Textual SimilaritySTS BenchmarkSpearman Correlation0.787Mirror-RoBERTa-base (unsup.)
Semantic Textual SimilaritySTS BenchmarkSpearman Correlation0.764Mirror-BERT-base (unsup.)
Semantic Textual SimilaritySTS12Spearman Correlation0.674Mirror-BERT-base (unsup.)
Semantic Textual SimilaritySTS12Spearman Correlation0.648Mirror-RoBERTa-base (unsup.)
Semantic Textual SimilaritySTS16Spearman Correlation0.78Mirror-RoBERTa-base (unsup.)
Semantic Textual SimilaritySTS16Spearman Correlation0.743Mirror-BERT-base (unsup.)

Related Papers

SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16LLM-Driven Dual-Level Multi-Interest Modeling for Recommendation2025-07-15Latent Space Consistency for Sparse-View CT Reconstruction2025-07-15Self-supervised pretraining of vision transformers for animal behavioral analysis and neural encoding2025-07-13