TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/TSDAE: Using Transformer-based Sequential Denoising Auto-E...

TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning

Kexin Wang, Nils Reimers, Iryna Gurevych

2021-04-14DenoisingParaphrase IdentificationSentence EmbeddingSentence EmbeddingsSemantic Textual SimilarityInformation RetrievalRe-RankingSTSSentence-EmbeddingLanguage ModellingDomain Adaptation
PaperPDFCodeCode(official)CodeCode(official)Code(official)Code

Abstract

Learning sentence embeddings often requires a large amount of labeled data. However, for most tasks and domains, labeled data is seldom available and creating it is expensive. In this work, we present a new state-of-the-art unsupervised method based on pre-trained Transformers and Sequential Denoising Auto-Encoder (TSDAE) which outperforms previous approaches by up to 6.4 points. It can achieve up to 93.1% of the performance of in-domain supervised approaches. Further, we show that TSDAE is a strong domain adaptation and pre-training method for sentence embeddings, significantly outperforming other approaches like Masked Language Model. A crucial shortcoming of previous studies is the narrow evaluation: Most work mainly evaluates on the single task of Semantic Textual Similarity (STS), which does not require any domain knowledge. It is unclear if these proposed methods generalize to other domains and tasks. We fill this gap and evaluate TSDAE and other recent approaches on four different datasets from heterogeneous domains.

Results

TaskDatasetMetricValueModel
Semantic Textual SimilarityPITAP69.2TSDAE
Semantic Textual SimilarityTURLAP76.8TSDAE
Paraphrase IdentificationPITAP69.2TSDAE
Paraphrase IdentificationTURLAP76.8TSDAE
Information RetrievalCQADupStackmAP@1000.145TSDAE

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21From Neurons to Semantics: Evaluating Cross-Linguistic Alignment Capabilities of Large Language Models via Neurons Alignment2025-07-20fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17