Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Natural Language Inference
/
MultiNLI
Natural Language Inference on MultiNLI
Metric: Matched (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Matched (best first)
Matched (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Matched
▼
Extra Data
Paper
Date
↕
Code
1
Turing NLR v5 XXL 5.4B (fine-tuned)
92.6
No
-
-
-
2
UnitedSynT5 (3B)
92.6
Yes
First Train to Generate, then Generate to Train:...
2024-12-12
-
3
T5
92
No
SMART: Robust and Efficient Fine-Tuning for Pre-...
2019-11-08
Code
4
T5-XXL 11B (fine-tuned)
92
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
5
T5-3B
91.4
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
6
ALBERT
91.3
No
ALBERT: A Lite BERT for Self-supervised Learning...
2019-09-26
Code
7
DeBERTa (large)
91.1
No
DeBERTa: Decoding-enhanced BERT with Disentangle...
2020-06-05
Code
8
Adv-RoBERTa ensemble
91.1
No
StructBERT: Incorporating Language Structures in...
2019-08-13
-
9
RoBERTa
90.8
No
RoBERTa: A Robustly Optimized BERT Pretraining A...
2019-07-26
Code
10
XLNet (single model)
90.8
No
XLNet: Generalized Autoregressive Pretraining fo...
2019-06-19
Code
11
RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)
90.2
No
LLM.int8(): 8-bit Matrix Multiplication for Tran...
2022-08-15
Code
12
T5-Large
89.9
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
13
PSQ (Chen et al., 2020)
89.9
No
A Statistical Framework for Low-bitwidth Trainin...
2020-10-27
Code
14
UnitedSynT5 (335M)
89.8
Yes
First Train to Generate, then Generate to Train:...
2024-12-12
-
15
ERNIE 2.0 Large
88.7
No
ERNIE 2.0: A Continual Pre-training Framework fo...
2019-07-29
Code
16
SpanBERT
88.1
No
SpanBERT: Improving Pre-training by Representing...
2019-07-24
Code
17
BERT-Large
88
No
FNet: Mixing Tokens with Fourier Transforms
2021-05-09
Code
18
ASA + RoBERTa
88
No
Adversarial Self-Attention for Language Understa...
2022-06-25
Code
19
MT-DNN-ensemble
87.9
No
Improving Multi-Task Deep Neural Networks via Kn...
2019-04-20
Code
20
Q-BERT (Shen et al., 2020)
87.8
No
Q-BERT: Hessian Based Ultra Low Precision Quanti...
2019-09-12
-
21
Snorkel MeTaL (ensemble)
87.6
No
Training Complex Models with Multi-Task Weak Sup...
2018-10-05
Code
22
BigBird
87.5
No
Big Bird: Transformers for Longer Sequences
2020-07-28
Code
23
T5-Base
87.1
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
24
MT-DNN
86.7
No
Multi-Task Deep Neural Networks for Natural Lang...
2019-01-31
Code
25
BERT-LARGE
86.7
No
BERT: Pre-training of Deep Bidirectional Transfo...
2018-10-11
Code
26
RealFormer
86.28
No
RealFormer: Transformer Likes Residual Attention
2020-12-21
Code
27
gMLP-large
86.2
No
Pay Attention to MLPs
2021-05-17
Code
28
ERNIE 2.0 Base
86.1
No
ERNIE 2.0: A Continual Pre-training Framework fo...
2019-07-29
Code
29
Q8BERT (Zafrir et al., 2019)
85.6
No
Q8BERT: Quantized 8Bit BERT
2019-10-14
Code
30
ASA + BERT-base
85
No
Adversarial Self-Attention for Language Understa...
2022-06-25
Code
31
TinyBERT-6 67M
84.6
No
TinyBERT: Distilling BERT for Natural Language U...
2019-09-23
Code
32
ELC-BERT-base 98M (zero init)
84.4
No
Not all layers are equally as important: Every L...
2023-11-03
-
33
24hBERT
84.4
No
How to Train BERT with an Academic Budget
2021-04-15
Code
34
ERNIE
84
No
ERNIE: Enhanced Language Representation with Inf...
2019-05-17
Code
35
Charformer-Tall
83.7
No
Charformer: Fast Character Transformers via Grad...
2021-06-23
Code
36
LTG-BERT-base 98M
83
No
Not all layers are equally as important: Every L...
2023-11-03
-
37
TinyBERT-4 14.5M
82.5
No
TinyBERT: Distilling BERT for Natural Language U...
2019-09-23
Code
38
T5-Small
82.4
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
39
MFAE
82.31
No
-
-
Code
40
Finetuned Transformer LM
82.1
No
-
-
-
41
Finetuned Transformer LM
82.1
No
-
-
Code
42
SqueezeBERT
82
No
SqueezeBERT: What can computer vision teach NLP ...
2020-06-19
Code
43
GPST(unsupervised generative syntactic LM)
81.8
No
Generative Pretrained Structured Transformers: U...
2024-03-13
Code
44
ELC-BERT-small 24M
79.2
No
Not all layers are equally as important: Every L...
2023-11-03
-
45
LTG-BERT-small 24M
78
No
Not all layers are equally as important: Every L...
2023-11-03
-
46
FNet-Large
78
No
FNet: Mixing Tokens with Fourier Transforms
2021-05-09
Code
47
aESIM
73.9
No
Attention Boosted Sequential Inference Model
2018-12-05
-
48
T5-Large 738M
72.4
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
49
Multi-task BiLSTM + Attn
72.2
No
GLUE: A Multi-Task Benchmark and Analysis Platfo...
2018-04-20
Code
50
Stacked Bi-LSTMs (shortcut connections, max-pooling)
71.4
No
Combining Similarity Features and Deep Represent...
2018-11-02
Code
51
GenSen
71.4
No
Learning General Purpose Distributed Sentence Re...
2018-03-30
Code
52
Bi-LSTM sentence encoder (max-pooling)
70.7
No
Combining Similarity Features and Deep Represent...
2018-11-02
Code
53
Stacked Bi-LSTMs (shortcut connections, max-pooling, attention)
70.7
No
Combining Similarity Features and Deep Represent...
2018-11-02
Code
54
SWEM-max
68.2
No
Baseline Needs More Love: On Simple Word-Embeddi...
2018-05-24
Code
55
LaMini-GPT 1.5B
67.5
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
56
LaMini-F-T5 783M
61.4
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
57
LaMini-T5 738M
54.7
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
58
GPT-2-XL 1.5B
36.5
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
#1
Turing NLR v5 XXL 5.4B (fine-tuned)
92.6
Matched
No paper
#2
UnitedSynT5 (3B)
SOTA
92.6
Matched
· Extra Data
· 2024-12-12
First Train to Generate, then Generate to Train: UnitedSynT5 for Few-Shot NLI
#3
T5
92
Matched
· 2019-11-08
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
Code
#4
T5-XXL 11B (fine-tuned)
SOTA
92
Matched
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#5
T5-3B
91.4
Matched
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#6
ALBERT
SOTA
91.3
Matched
· 2019-09-26
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Code
#7
DeBERTa (large)
91.1
Matched
· 2020-06-05
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Code
#8
Adv-RoBERTa ensemble
SOTA
91.1
Matched
· 2019-08-13
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
#9
RoBERTa
90.8
Matched
· 2019-07-26
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Code
#10
XLNet (single model)
SOTA
90.8
Matched
· 2019-06-19
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Code
#11
RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)
90.2
Matched
· 2022-08-15
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Code
#12
T5-Large
89.9
Matched
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#13
PSQ (Chen et al., 2020)
89.9
Matched
· 2020-10-27
A Statistical Framework for Low-bitwidth Training of Deep Neural Networks
Code
#14
UnitedSynT5 (335M)
89.8
Matched
· Extra Data
· 2024-12-12
First Train to Generate, then Generate to Train: UnitedSynT5 for Few-Shot NLI
#15
ERNIE 2.0 Large
88.7
Matched
· 2019-07-29
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
Code
#16
SpanBERT
88.1
Matched
· 2019-07-24
SpanBERT: Improving Pre-training by Representing and Predicting Spans
Code
#17
BERT-Large
88
Matched
· 2021-05-09
FNet: Mixing Tokens with Fourier Transforms
Code
#18
ASA + RoBERTa
88
Matched
· 2022-06-25
Adversarial Self-Attention for Language Understanding
Code
#19
MT-DNN-ensemble
SOTA
87.9
Matched
· 2019-04-20
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
Code
#20
Q-BERT (Shen et al., 2020)
87.8
Matched
· 2019-09-12
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
#21
Snorkel MeTaL (ensemble)
SOTA
87.6
Matched
· 2018-10-05
Training Complex Models with Multi-Task Weak Supervision
Code
#22
BigBird
87.5
Matched
· 2020-07-28
Big Bird: Transformers for Longer Sequences
Code
#23
T5-Base
87.1
Matched
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#24
MT-DNN
86.7
Matched
· 2019-01-31
Multi-Task Deep Neural Networks for Natural Language Understanding
Code
#25
BERT-LARGE
86.7
Matched
· 2018-10-11
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Code
#26
RealFormer
86.28
Matched
· 2020-12-21
RealFormer: Transformer Likes Residual Attention
Code
#27
gMLP-large
86.2
Matched
· 2021-05-17
Pay Attention to MLPs
Code
#28
ERNIE 2.0 Base
86.1
Matched
· 2019-07-29
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
Code
#29
Q8BERT (Zafrir et al., 2019)
85.6
Matched
· 2019-10-14
Q8BERT: Quantized 8Bit BERT
Code
#30
ASA + BERT-base
85
Matched
· 2022-06-25
Adversarial Self-Attention for Language Understanding
Code
#31
TinyBERT-6 67M
84.6
Matched
· 2019-09-23
TinyBERT: Distilling BERT for Natural Language Understanding
Code
#32
ELC-BERT-base 98M (zero init)
84.4
Matched
· 2023-11-03
Not all layers are equally as important: Every Layer Counts BERT
#33
24hBERT
84.4
Matched
· 2021-04-15
How to Train BERT with an Academic Budget
Code
#34
ERNIE
84
Matched
· 2019-05-17
ERNIE: Enhanced Language Representation with Informative Entities
Code
#35
Charformer-Tall
83.7
Matched
· 2021-06-23
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Code
#36
LTG-BERT-base 98M
83
Matched
· 2023-11-03
Not all layers are equally as important: Every Layer Counts BERT
#37
TinyBERT-4 14.5M
82.5
Matched
· 2019-09-23
TinyBERT: Distilling BERT for Natural Language Understanding
Code
#38
T5-Small
82.4
Matched
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#39
MFAE
82.31
Matched
No paper
Code
#40
Finetuned Transformer LM
82.1
Matched
No paper
#41
Finetuned Transformer LM
82.1
Matched
No paper
Code
#42
SqueezeBERT
82
Matched
· 2020-06-19
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
Code
#43
GPST(unsupervised generative syntactic LM)
81.8
Matched
· 2024-03-13
Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale
Code
#44
ELC-BERT-small 24M
79.2
Matched
· 2023-11-03
Not all layers are equally as important: Every Layer Counts BERT
#45
LTG-BERT-small 24M
78
Matched
· 2023-11-03
Not all layers are equally as important: Every Layer Counts BERT
#46
FNet-Large
78
Matched
· 2021-05-09
FNet: Mixing Tokens with Fourier Transforms
Code
#47
aESIM
73.9
Matched
· 2018-12-05
Attention Boosted Sequential Inference Model
#48
T5-Large 738M
72.4
Matched
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#49
Multi-task BiLSTM + Attn
SOTA
72.2
Matched
· 2018-04-20
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Code
#50
Stacked Bi-LSTMs (shortcut connections, max-pooling)
71.4
Matched
· 2018-11-02
Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News
Code
#51
GenSen
SOTA
71.4
Matched
· 2018-03-30
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
Code
#52
Bi-LSTM sentence encoder (max-pooling)
70.7
Matched
· 2018-11-02
Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News
Code
#53
Stacked Bi-LSTMs (shortcut connections, max-pooling, attention)
70.7
Matched
· 2018-11-02
Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News
Code
#54
SWEM-max
68.2
Matched
· 2018-05-24
Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms
Code
#55
LaMini-GPT 1.5B
67.5
Matched
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#56
LaMini-F-T5 783M
61.4
Matched
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#57
LaMini-T5 738M
54.7
Matched
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#58
GPT-2-XL 1.5B
36.5
Matched
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code