Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Semantic Textual Similarity
/
STS Benchmark
Semantic Textual Similarity on STS Benchmark
Metric: Pearson Correlation (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Pearson Correlation (best first)
Pearson Correlation (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Pearson Correlation
▼
Extra Data
Paper
Date
↕
Code
1
MT-DNN-SMART
0.929
No
SMART: Robust and Efficient Fine-Tuning for Pre-...
2019-11-08
Code
2
StructBERTRoBERTa ensemble
0.928
No
StructBERT: Incorporating Language Structures in...
2019-08-13
-
3
Mnet-Sim
0.927
No
MNet-Sim: A Multi-layered Semantic Similarity Ne...
2021-11-09
-
4
T5-11B
0.925
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
5
ALBERT
0.925
Yes
ALBERT: A Lite BERT for Self-supervised Learning...
2019-09-26
Code
6
XLNet (single model)
0.925
No
XLNet: Generalized Autoregressive Pretraining fo...
2019-06-19
Code
7
RoBERTa
0.922
No
RoBERTa: A Robustly Optimized BERT Pretraining A...
2019-07-26
Code
8
ELECTRA
0.921
No
-
-
-
9
RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)
0.919
No
LLM.int8(): 8-bit Matrix Multiplication for Tran...
2022-08-15
Code
10
PSQ (Chen et al., 2020)
0.919
No
A Statistical Framework for Low-bitwidth Trainin...
2020-10-27
Code
11
RoBERTa-large 355M + Entailment as Few-shot Learner
0.918
No
Entailment as Few-Shot Learner
2021-04-29
Code
12
ERNIE 2.0 Large
0.912
No
ERNIE 2.0: A Continual Pre-training Framework fo...
2019-07-29
Code
13
Q-BERT (Shen et al., 2020)
0.911
No
Q-BERT: Hessian Based Ultra Low Precision Quanti...
2019-09-12
-
14
Q8BERT (Zafrir et al., 2019)
0.911
No
Q8BERT: Quantized 8Bit BERT
2019-10-14
Code
15
ELECTRA (no tricks)
0.91
No
-
-
-
16
DistilBERT 66M
0.907
No
DistilBERT, a distilled version of BERT: smaller...
2019-10-02
Code
17
T5-3B
0.906
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
18
MLM+ del-word
0.905
No
CLEAR: Contrastive Learning for Sentence Represe...
2020-12-31
-
19
RealFormer
0.9011
No
RealFormer: Transformer Likes Residual Attention
2020-12-21
Code
20
T5-Large
0.899
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
21
SpanBERT
0.899
No
SpanBERT: Improving Pre-training by Representing...
2019-07-24
Code
22
T5-Base
0.894
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
23
ERNIE 2.0 Base
0.876
No
ERNIE 2.0: A Continual Pre-training Framework fo...
2019-07-29
Code
24
Charformer-Tall
0.873
No
Charformer: Fast Character Transformers via Grad...
2021-06-23
Code
25
T5-Small
0.856
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
26
ERNIE
0.832
No
ERNIE: Enhanced Language Representation with Inf...
2019-05-17
Code
27
24hBERT
0.82
No
How to Train BERT with an Academic Budget
2021-04-15
Code
28
TinyBERT-4 14.5M
0.799
No
TinyBERT: Distilling BERT for Natural Language U...
2019-09-23
Code
29
USE_T
0.782
No
Universal Sentence Encoder
2018-03-29
Code
#1
MT-DNN-SMART
SOTA
0.929
Pearson Correlation
· 2019-11-08
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
Code
#2
StructBERTRoBERTa ensemble
SOTA
0.928
Pearson Correlation
· 2019-08-13
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
#3
Mnet-Sim
0.927
Pearson Correlation
· 2021-11-09
MNet-Sim: A Multi-layered Semantic Similarity Network to Evaluate Sentence Similarity
#4
T5-11B
0.925
Pearson Correlation
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#5
ALBERT
0.925
Pearson Correlation
· Extra Data
· 2019-09-26
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Code
#6
XLNet (single model)
SOTA
0.925
Pearson Correlation
· 2019-06-19
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Code
#7
RoBERTa
0.922
Pearson Correlation
· 2019-07-26
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Code
#8
ELECTRA
0.921
Pearson Correlation
No paper
#9
RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)
0.919
Pearson Correlation
· 2022-08-15
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Code
#10
PSQ (Chen et al., 2020)
0.919
Pearson Correlation
· 2020-10-27
A Statistical Framework for Low-bitwidth Training of Deep Neural Networks
Code
#11
RoBERTa-large 355M + Entailment as Few-shot Learner
0.918
Pearson Correlation
· 2021-04-29
Entailment as Few-Shot Learner
Code
#12
ERNIE 2.0 Large
0.912
Pearson Correlation
· 2019-07-29
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
Code
#13
Q-BERT (Shen et al., 2020)
0.911
Pearson Correlation
· 2019-09-12
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
#14
Q8BERT (Zafrir et al., 2019)
0.911
Pearson Correlation
· 2019-10-14
Q8BERT: Quantized 8Bit BERT
Code
#15
ELECTRA (no tricks)
0.91
Pearson Correlation
No paper
#16
DistilBERT 66M
0.907
Pearson Correlation
· 2019-10-02
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Code
#17
T5-3B
0.906
Pearson Correlation
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#18
MLM+ del-word
0.905
Pearson Correlation
· 2020-12-31
CLEAR: Contrastive Learning for Sentence Representation
#19
RealFormer
0.9011
Pearson Correlation
· 2020-12-21
RealFormer: Transformer Likes Residual Attention
Code
#20
T5-Large
0.899
Pearson Correlation
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#21
SpanBERT
0.899
Pearson Correlation
· 2019-07-24
SpanBERT: Improving Pre-training by Representing and Predicting Spans
Code
#22
T5-Base
0.894
Pearson Correlation
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#23
ERNIE 2.0 Base
0.876
Pearson Correlation
· 2019-07-29
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
Code
#24
Charformer-Tall
0.873
Pearson Correlation
· 2021-06-23
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Code
#25
T5-Small
0.856
Pearson Correlation
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#26
ERNIE
SOTA
0.832
Pearson Correlation
· 2019-05-17
ERNIE: Enhanced Language Representation with Informative Entities
Code
#27
24hBERT
0.82
Pearson Correlation
· 2021-04-15
How to Train BERT with an Academic Budget
Code
#28
TinyBERT-4 14.5M
0.799
Pearson Correlation
· 2019-09-23
TinyBERT: Distilling BERT for Natural Language Understanding
Code
#29
USE_T
SOTA
0.782
Pearson Correlation
· 2018-03-29
Universal Sentence Encoder
Code