Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Natural Language Inference
/
MultiNLI
Natural Language Inference on MultiNLI
Metric: Mismatched (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
#
Model
↕
Mismatched
▼
Extra Data
Paper
Date
↕
Code
1
Turing NLR v5 XXL 5.4B (fine-tuned)
92.4
No
-
-
-
2
T5
91.7
No
SMART: Robust and Efficient Fine-Tuning for Pre-...
2019-11-08
Code
3
T5-11B
91.7
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
4
T5-3B
91.2
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
5
DeBERTa (large)
91.1
No
DeBERTa: Decoding-enhanced BERT with Disentangle...
2020-06-05
Code
6
Adv-RoBERTa ensemble
90.7
No
StructBERT: Incorporating Language Structures in...
2019-08-13
-
7
RoBERTa (ensemble)
90.2
No
RoBERTa: A Robustly Optimized BERT Pretraining A...
2019-07-26
Code
8
T5-Large 770M
89.6
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
9
ERNIE 2.0 Large
88.8
No
ERNIE 2.0: A Continual Pre-training Framework fo...
2019-07-29
Code
10
BERT-Large
88
No
FNet: Mixing Tokens with Fourier Transforms
2021-05-09
Code
11
MT-DNN-ensemble
87.4
No
Improving Multi-Task Deep Neural Networks via Kn...
2019-04-20
Code
12
Snorkel MeTaL (ensemble)
87.2
No
Training Complex Models with Multi-Task Weak Sup...
2018-10-05
Code
13
gMLP-large
86.5
No
Pay Attention to MLPs
2021-05-17
Code
14
RealFormer
86.34
No
RealFormer: Transformer Likes Residual Attention
2020-12-21
Code
15
T5-Base
86.2
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
16
MT-DNN
86
No
Multi-Task Deep Neural Networks for Natural Lang...
2019-01-31
Code
17
BERT-LARGE
85.9
No
BERT: Pre-training of Deep Bidirectional Transfo...
2018-10-11
Code
18
ERNIE 2.0 Base
85.5
No
ERNIE 2.0: A Continual Pre-training Framework fo...
2019-07-29
Code
19
ELC-BERT-base 98M (zero init)
84.5
No
Not all layers are equally as important: Every L...
2023-11-03
-
20
Charformer-Tall
84.4
No
Charformer: Fast Character Transformers via Grad...
2021-06-23
Code
21
24hBERT
83.8
No
How to Train BERT with an Academic Budget
2021-04-15
Code
22
LTG-BERT-base 98M
83.4
No
Not all layers are equally as important: Every L...
2023-11-03
-
23
TinyBERT-6 67M
83.2
No
TinyBERT: Distilling BERT for Natural Language U...
2019-09-23
Code
24
ERNIE
83.2
No
ERNIE: Enhanced Language Representation with Inf...
2019-05-17
Code
25
T5-Small
82.3
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
26
GPST(unsupervised generative syntactic LM)
82
No
Generative Pretrained Structured Transformers: U...
2024-03-13
Code
27
TinyBERT-4 14.5M
81.8
No
TinyBERT: Distilling BERT for Natural Language U...
2019-09-23
Code
28
MFAE
81.43
No
-
-
Code
29
Finetuned Transformer LM
81.4
No
-
-
-
30
Finetuned Transformer LM
81.4
No
-
-
Code
31
SqueezeBERT
81.1
No
SqueezeBERT: What can computer vision teach NLP ...
2020-06-19
Code
32
ELC-BERT-small 24M
79.9
No
Not all layers are equally as important: Every L...
2023-11-03
-
33
LTG-BERT-small 24M
78.8
No
Not all layers are equally as important: Every L...
2023-11-03
-
34
FNet-Large
76
No
FNet: Mixing Tokens with Fourier Transforms
2021-05-09
Code
35
aESIM
73.9
No
Attention Boosted Sequential Inference Model
2018-12-05
-
36
Stacked Bi-LSTMs (shortcut connections, max-pooling)
72.2
No
Combining Similarity Features and Deep Represent...
2018-11-02
Code
37
Multi-task BiLSTM + Attn
72.1
No
GLUE: A Multi-Task Benchmark and Analysis Platfo...
2018-04-20
Code
38
T5-Large 738M
72
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
39
GenSen
71.3
No
Learning General Purpose Distributed Sentence Re...
2018-03-30
Code
40
Bi-LSTM sentence encoder (max-pooling)
71.1
No
Combining Similarity Features and Deep Represent...
2018-11-02
Code
41
Stacked Bi-LSTMs (shortcut connections, max-pooling, attention)
70.5
No
Combining Similarity Features and Deep Represent...
2018-11-02
Code
42
LaMini-GPT 1.5B
69.3
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
43
SWEM-max
67.7
No
Baseline Needs More Love: On Simple Word-Embeddi...
2018-05-24
Code
44
LaMini-F-T5 783M
61
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
45
LaMini-T5 738M
55.8
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
46
GPT-2-XL 1.5B
37
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code