TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/On the Sentence Embeddings from Pre-trained Language Models

On the Sentence Embeddings from Pre-trained Language Models

Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, Lei LI

2020-11-02EMNLP 2020 11Sentence EmbeddingSentence EmbeddingsSemantic SimilaritySemantic Textual SimilaritySentence-EmbeddingLanguage Modelling
PaperPDFCodeCode(official)Code

Abstract

Pre-trained contextual representations like BERT have achieved great success in natural language processing. However, the sentence embeddings from the pre-trained language models without fine-tuning have been found to poorly capture semantic meaning of sentences. In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited. We first reveal the theoretical connection between the masked language model pre-training objective and the semantic similarity task theoretically, and then analyze the BERT sentence embeddings empirically. We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity. To address this issue, we propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective. Experimental results show that our proposed BERT-flow method obtains significant performance gains over the state-of-the-art sentence embeddings on a variety of semantic textual similarity tasks. The code is available at https://github.com/bohanli/BERT-flow.

Results

TaskDatasetMetricValueModel
Semantic Textual SimilaritySTS14Spearman Correlation0.6942BERTlarge-flow (target)
Semantic Textual SimilaritySTS15Spearman Correlation0.7492BERTlarge-flow (target)
Semantic Textual SimilaritySICKSpearman Correlation0.6544BERTbase-flow (NLI)
Semantic Textual SimilaritySTS13Spearman Correlation0.7339BERTlarge-flow (target)
Semantic Textual SimilaritySTS BenchmarkSpearman Correlation0.7226BERTlarge-flow (target)
Semantic Textual SimilaritySTS12Spearman Correlation0.652BERTlarge-flow (target)
Semantic Textual SimilaritySTS16Spearman Correlation0.7763BERTlarge-flow (target)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21From Neurons to Semantics: Evaluating Cross-Linguistic Alignment Capabilities of Large Language Models via Neurons Alignment2025-07-20SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16