On the Sentence Embeddings from Pre-trained Language Models

Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, Lei LI

2020-11-02EMNLP 2020 11Sentence Embedding Sentence Embeddings Semantic Similarity Semantic Textual Similarity Sentence-Embedding Language Modelling

Paper PDF Code Code(official)Code

Abstract

Pre-trained contextual representations like BERT have achieved great success in natural language processing. However, the sentence embeddings from the pre-trained language models without fine-tuning have been found to poorly capture semantic meaning of sentences. In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited. We first reveal the theoretical connection between the masked language model pre-training objective and the semantic similarity task theoretically, and then analyze the BERT sentence embeddings empirically. We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity. To address this issue, we propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective. Experimental results show that our proposed BERT-flow method obtains significant performance gains over the state-of-the-art sentence embeddings on a variety of semantic textual similarity tasks. The code is available at https://github.com/bohanli/BERT-flow.

Results

Task	Dataset	Metric	Value	Model
Semantic Textual Similarity	STS14	Spearman Correlation	0.6942	BERTlarge-flow (target)
Semantic Textual Similarity	STS15	Spearman Correlation	0.7492	BERTlarge-flow (target)
Semantic Textual Similarity	SICK	Spearman Correlation	0.6544	BERTbase-flow (NLI)
Semantic Textual Similarity	STS13	Spearman Correlation	0.7339	BERTlarge-flow (target)
Semantic Textual Similarity	STS Benchmark	Spearman Correlation	0.7226	BERTlarge-flow (target)
Semantic Textual Similarity	STS12	Spearman Correlation	0.652	BERTlarge-flow (target)
Semantic Textual Similarity	STS16	Spearman Correlation	0.7763	BERTlarge-flow (target)

On the Sentence Embeddings from Pre-trained Language Models

Abstract

Results

Related Papers

On the Sentence Embeddings from Pre-trained Language Models

Abstract

Results

Related Papers