Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Natural Language Inference
/
MultiNLI
Natural Language Inference on MultiNLI
Metric: Mismatched (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Mismatched (best first)
Mismatched (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Mismatched
▼
Extra Data
Paper
Date
↕
Code
1
Turing NLR v5 XXL 5.4B (fine-tuned)
92.4
No
-
-
-
2
T5
91.7
No
SMART: Robust and Efficient Fine-Tuning for Pre-...
2019-11-08
Code
3
T5-11B
91.7
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
4
T5-3B
91.2
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
5
DeBERTa (large)
91.1
No
DeBERTa: Decoding-enhanced BERT with Disentangle...
2020-06-05
Code
6
Adv-RoBERTa ensemble
90.7
No
StructBERT: Incorporating Language Structures in...
2019-08-13
-
7
RoBERTa (ensemble)
90.2
No
RoBERTa: A Robustly Optimized BERT Pretraining A...
2019-07-26
Code
8
T5-Large 770M
89.6
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
9
ERNIE 2.0 Large
88.8
No
ERNIE 2.0: A Continual Pre-training Framework fo...
2019-07-29
Code
10
BERT-Large
88
No
FNet: Mixing Tokens with Fourier Transforms
2021-05-09
Code
11
MT-DNN-ensemble
87.4
No
Improving Multi-Task Deep Neural Networks via Kn...
2019-04-20
Code
12
Snorkel MeTaL (ensemble)
87.2
No
Training Complex Models with Multi-Task Weak Sup...
2018-10-05
Code
13
gMLP-large
86.5
No
Pay Attention to MLPs
2021-05-17
Code
14
RealFormer
86.34
No
RealFormer: Transformer Likes Residual Attention
2020-12-21
Code
15
T5-Base
86.2
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
16
MT-DNN
86
No
Multi-Task Deep Neural Networks for Natural Lang...
2019-01-31
Code
17
BERT-LARGE
85.9
No
BERT: Pre-training of Deep Bidirectional Transfo...
2018-10-11
Code
18
ERNIE 2.0 Base
85.5
No
ERNIE 2.0: A Continual Pre-training Framework fo...
2019-07-29
Code
19
ELC-BERT-base 98M (zero init)
84.5
No
Not all layers are equally as important: Every L...
2023-11-03
-
20
Charformer-Tall
84.4
No
Charformer: Fast Character Transformers via Grad...
2021-06-23
Code
21
24hBERT
83.8
No
How to Train BERT with an Academic Budget
2021-04-15
Code
22
LTG-BERT-base 98M
83.4
No
Not all layers are equally as important: Every L...
2023-11-03
-
23
TinyBERT-6 67M
83.2
No
TinyBERT: Distilling BERT for Natural Language U...
2019-09-23
Code
24
ERNIE
83.2
No
ERNIE: Enhanced Language Representation with Inf...
2019-05-17
Code
25
T5-Small
82.3
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
26
GPST(unsupervised generative syntactic LM)
82
No
Generative Pretrained Structured Transformers: U...
2024-03-13
Code
27
TinyBERT-4 14.5M
81.8
No
TinyBERT: Distilling BERT for Natural Language U...
2019-09-23
Code
28
MFAE
81.43
No
-
-
Code
29
Finetuned Transformer LM
81.4
No
-
-
-
30
Finetuned Transformer LM
81.4
No
-
-
Code
31
SqueezeBERT
81.1
No
SqueezeBERT: What can computer vision teach NLP ...
2020-06-19
Code
32
ELC-BERT-small 24M
79.9
No
Not all layers are equally as important: Every L...
2023-11-03
-
33
LTG-BERT-small 24M
78.8
No
Not all layers are equally as important: Every L...
2023-11-03
-
34
FNet-Large
76
No
FNet: Mixing Tokens with Fourier Transforms
2021-05-09
Code
35
aESIM
73.9
No
Attention Boosted Sequential Inference Model
2018-12-05
-
36
Stacked Bi-LSTMs (shortcut connections, max-pooling)
72.2
No
Combining Similarity Features and Deep Represent...
2018-11-02
Code
37
Multi-task BiLSTM + Attn
72.1
No
GLUE: A Multi-Task Benchmark and Analysis Platfo...
2018-04-20
Code
38
T5-Large 738M
72
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
39
GenSen
71.3
No
Learning General Purpose Distributed Sentence Re...
2018-03-30
Code
40
Bi-LSTM sentence encoder (max-pooling)
71.1
No
Combining Similarity Features and Deep Represent...
2018-11-02
Code
41
Stacked Bi-LSTMs (shortcut connections, max-pooling, attention)
70.5
No
Combining Similarity Features and Deep Represent...
2018-11-02
Code
42
LaMini-GPT 1.5B
69.3
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
43
SWEM-max
67.7
No
Baseline Needs More Love: On Simple Word-Embeddi...
2018-05-24
Code
44
LaMini-F-T5 783M
61
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
45
LaMini-T5 738M
55.8
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
46
GPT-2-XL 1.5B
37
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
#1
Turing NLR v5 XXL 5.4B (fine-tuned)
92.4
Mismatched
No paper
#2
T5
91.7
Mismatched
· 2019-11-08
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
Code
#3
T5-11B
SOTA
91.7
Mismatched
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#4
T5-3B
91.2
Mismatched
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#5
DeBERTa (large)
91.1
Mismatched
· 2020-06-05
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Code
#6
Adv-RoBERTa ensemble
SOTA
90.7
Mismatched
· 2019-08-13
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
#7
RoBERTa (ensemble)
SOTA
90.2
Mismatched
· 2019-07-26
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Code
#8
T5-Large 770M
89.6
Mismatched
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#9
ERNIE 2.0 Large
88.8
Mismatched
· 2019-07-29
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
Code
#10
BERT-Large
88
Mismatched
· 2021-05-09
FNet: Mixing Tokens with Fourier Transforms
Code
#11
MT-DNN-ensemble
SOTA
87.4
Mismatched
· 2019-04-20
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
Code
#12
Snorkel MeTaL (ensemble)
SOTA
87.2
Mismatched
· 2018-10-05
Training Complex Models with Multi-Task Weak Supervision
Code
#13
gMLP-large
86.5
Mismatched
· 2021-05-17
Pay Attention to MLPs
Code
#14
RealFormer
86.34
Mismatched
· 2020-12-21
RealFormer: Transformer Likes Residual Attention
Code
#15
T5-Base
86.2
Mismatched
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#16
MT-DNN
86
Mismatched
· 2019-01-31
Multi-Task Deep Neural Networks for Natural Language Understanding
Code
#17
BERT-LARGE
85.9
Mismatched
· 2018-10-11
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Code
#18
ERNIE 2.0 Base
85.5
Mismatched
· 2019-07-29
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
Code
#19
ELC-BERT-base 98M (zero init)
84.5
Mismatched
· 2023-11-03
Not all layers are equally as important: Every Layer Counts BERT
#20
Charformer-Tall
84.4
Mismatched
· 2021-06-23
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Code
#21
24hBERT
83.8
Mismatched
· 2021-04-15
How to Train BERT with an Academic Budget
Code
#22
LTG-BERT-base 98M
83.4
Mismatched
· 2023-11-03
Not all layers are equally as important: Every Layer Counts BERT
#23
TinyBERT-6 67M
83.2
Mismatched
· 2019-09-23
TinyBERT: Distilling BERT for Natural Language Understanding
Code
#24
ERNIE
83.2
Mismatched
· 2019-05-17
ERNIE: Enhanced Language Representation with Informative Entities
Code
#25
T5-Small
82.3
Mismatched
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#26
GPST(unsupervised generative syntactic LM)
82
Mismatched
· 2024-03-13
Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale
Code
#27
TinyBERT-4 14.5M
81.8
Mismatched
· 2019-09-23
TinyBERT: Distilling BERT for Natural Language Understanding
Code
#28
MFAE
81.43
Mismatched
No paper
Code
#29
Finetuned Transformer LM
81.4
Mismatched
No paper
#30
Finetuned Transformer LM
81.4
Mismatched
No paper
Code
#31
SqueezeBERT
81.1
Mismatched
· 2020-06-19
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
Code
#32
ELC-BERT-small 24M
79.9
Mismatched
· 2023-11-03
Not all layers are equally as important: Every Layer Counts BERT
#33
LTG-BERT-small 24M
78.8
Mismatched
· 2023-11-03
Not all layers are equally as important: Every Layer Counts BERT
#34
FNet-Large
76
Mismatched
· 2021-05-09
FNet: Mixing Tokens with Fourier Transforms
Code
#35
aESIM
73.9
Mismatched
· 2018-12-05
Attention Boosted Sequential Inference Model
#36
Stacked Bi-LSTMs (shortcut connections, max-pooling)
72.2
Mismatched
· 2018-11-02
Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News
Code
#37
Multi-task BiLSTM + Attn
SOTA
72.1
Mismatched
· 2018-04-20
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Code
#38
T5-Large 738M
72
Mismatched
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#39
GenSen
SOTA
71.3
Mismatched
· 2018-03-30
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
Code
#40
Bi-LSTM sentence encoder (max-pooling)
71.1
Mismatched
· 2018-11-02
Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News
Code
#41
Stacked Bi-LSTMs (shortcut connections, max-pooling, attention)
70.5
Mismatched
· 2018-11-02
Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News
Code
#42
LaMini-GPT 1.5B
69.3
Mismatched
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#43
SWEM-max
67.7
Mismatched
· 2018-05-24
Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms
Code
#44
LaMini-F-T5 783M
61
Mismatched
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#45
LaMini-T5 738M
55.8
Mismatched
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#46
GPT-2-XL 1.5B
37
Mismatched
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code