TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Trans-Encoder: Unsupervised sentence-pair modelling throug...

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Fangyu Liu, Yunlong Jiao, Jordan Massiah, Emine Yilmaz, Serhii Havrylov

2021-09-27ICLR 2022 4Paraphrase IdentificationSentence SimilaritySemantic Textual SimilarityContrastive LearningLanguage Modelling
PaperPDFCode(official)

Abstract

In NLP, a large volume of tasks involve pairwise comparison between two sequences (e.g. sentence similarity and paraphrase identification). Predominantly, two formulations are used for sentence-pair tasks: bi-encoders and cross-encoders. Bi-encoders produce fixed-dimensional sentence representations and are computationally efficient, however, they usually underperform cross-encoders. Cross-encoders can leverage their attention heads to exploit inter-sentence interactions for better performance but they require task fine-tuning and are computationally more expensive. In this paper, we present a completely unsupervised sentence representation model termed as Trans-Encoder that combines the two learning paradigms into an iterative joint framework to simultaneously learn enhanced bi- and cross-encoders. Specifically, on top of a pre-trained Language Model (PLM), we start with converting it to an unsupervised bi-encoder, and then alternate between the bi- and cross-encoder task formulations. In each alternation, one task formulation will produce pseudo-labels which are used as learning signals for the other task formulation. We then propose an extension to conduct such self-distillation approach on multiple PLMs in parallel and use the average of their pseudo-labels for mutual-distillation. Trans-Encoder creates, to the best of our knowledge, the first completely unsupervised cross-encoder and also a state-of-the-art unsupervised bi-encoder for sentence similarity. Both the bi-encoder and cross-encoder formulations of Trans-Encoder outperform recently proposed state-of-the-art unsupervised sentence encoders such as Mirror-BERT and SimCSE by up to 5% on the sentence similarity benchmarks.

Results

TaskDatasetMetricValueModel
Semantic Textual SimilaritySTS14Spearman Correlation0.8194Trans-Encoder-RoBERTa-large-cross (unsup.)
Semantic Textual SimilaritySTS14Spearman Correlation0.8176Trans-Encoder-RoBERTa-large-bi (unsup.)
Semantic Textual SimilaritySTS14Spearman Correlation0.8137Trans-Encoder-BERT-large-bi (unsup.)
Semantic Textual SimilaritySTS14Spearman Correlation0.7903Trans-Encoder-RoBERTa-base-cross (unsup.)
Semantic Textual SimilaritySTS14Spearman Correlation0.779Trans-Encoder-BERT-base-bi (unsup.)
Semantic Textual SimilaritySTS15Spearman Correlation0.8863Trans-Encoder-RoBERTa-large-cross (unsup.)
Semantic Textual SimilaritySTS15Spearman Correlation0.8816Trans-Encoder-BERT-large-bi (unsup.)
Semantic Textual SimilaritySTS15Spearman Correlation0.8577Trans-Encoder-RoBERTa-base-cross (unsup.)
Semantic Textual SimilaritySTS15Spearman Correlation0.8508Trans-Encoder-BERT-base-bi (unsup.)
Semantic Textual SimilaritySTS15Spearman Correlation0.8444Trans-Encoder-BERT-base-cross (unsup.)
Semantic Textual SimilaritySICKSpearman Correlation0.7276Trans-Encoder-BERT-base-bi (unsup.)
Semantic Textual SimilaritySICKSpearman Correlation0.7192Trans-Encoder-BERT-large-cross (unsup.)
Semantic Textual SimilaritySICKSpearman Correlation0.7163Trans-Encoder-RoBERTa-large-cross (unsup.)
Semantic Textual SimilaritySICKSpearman Correlation0.7133Trans-Encoder-BERT-large-bi (unsup.)
Semantic Textual SimilaritySICKSpearman Correlation0.6952Trans-Encoder-BERT-base-cross (unsup.)
Semantic Textual SimilaritySTS13Spearman Correlation0.8851Trans-Encoder-BERT-large-bi (unsup.)
Semantic Textual SimilaritySTS13Spearman Correlation0.8831Trans-Encoder-BERT-large-cross (unsup.)
Semantic Textual SimilaritySTS13Spearman Correlation0.8831Trans-Encoder-RoBERTa-large-cross (unsup.)
Semantic Textual SimilaritySTS13Spearman Correlation0.8559Trans-Encoder-BERT-base-cross (unsup.)
Semantic Textual SimilaritySTS13Spearman Correlation0.851Trans-Encoder-BERT-base-bi (unsup.)
Semantic Textual SimilaritySTS BenchmarkSpearman Correlation0.867Trans-Encoder-RoBERTa-large-cross (unsup.)
Semantic Textual SimilaritySTS BenchmarkSpearman Correlation0.8655Trans-Encoder-RoBERTa-large-bi (unsup.)
Semantic Textual SimilaritySTS BenchmarkSpearman Correlation0.8616Trans-Encoder-BERT-large-bi (unsup.)
Semantic Textual SimilaritySTS BenchmarkSpearman Correlation0.8465Trans-Encoder-RoBERTa-base-cross (unsup.)
Semantic Textual SimilaritySTS BenchmarkSpearman Correlation0.839Trans-Encoder-BERT-base-bi (unsup.)
Semantic Textual SimilaritySTS12Spearman Correlation0.7828Trans-Encoder-RoBERTa-large-cross (unsup.)
Semantic Textual SimilaritySTS12Spearman Correlation0.7819Trans-Encoder-BERT-large-bi (unsup.)
Semantic Textual SimilaritySTS12Spearman Correlation0.7637Trans-Encoder-RoBERTa-base-cross (unsup.)
Semantic Textual SimilaritySTS12Spearman Correlation0.7509Trans-Encoder-BERT-base-bi (unsup.)
Semantic Textual SimilaritySTS16Spearman Correlation0.8503Trans-Encoder-RoBERTa-large-cross (unsup.)
Semantic Textual SimilaritySTS16Spearman Correlation0.8481Trans-Encoder-BERT-large-bi (unsup.)
Semantic Textual SimilaritySTS16Spearman Correlation0.8377Trans-Encoder-RoBERTa-base-cross (unsup.)
Semantic Textual SimilaritySTS16Spearman Correlation0.8305Trans-Encoder-BERT-base-bi (unsup.)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17