Natural Language Inference on MultiNLI

Metric: Matched (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

#	Model↕	Matched▼	Extra Data	Paper	Date↕	Code
1	Turing NLR v5 XXL 5.4B (fine-tuned)	92.6	No	-	-	-
2	UnitedSynT5 (3B)	92.6	Yes	First Train to Generate, then Generate to Train:...	2024-12-12	-
3	T5	92	No	SMART: Robust and Efficient Fine-Tuning for Pre-...	2019-11-08	Code
4	T5-XXL 11B (fine-tuned)	92	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
5	T5-3B	91.4	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
6	ALBERT	91.3	No	ALBERT: A Lite BERT for Self-supervised Learning...	2019-09-26	Code
7	DeBERTa (large)	91.1	No	DeBERTa: Decoding-enhanced BERT with Disentangle...	2020-06-05	Code
8	Adv-RoBERTa ensemble	91.1	No	StructBERT: Incorporating Language Structures in...	2019-08-13	-
9	RoBERTa	90.8	No	RoBERTa: A Robustly Optimized BERT Pretraining A...	2019-07-26	Code
10	XLNet (single model)	90.8	No	XLNet: Generalized Autoregressive Pretraining fo...	2019-06-19	Code
11	RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)	90.2	No	LLM.int8(): 8-bit Matrix Multiplication for Tran...	2022-08-15	Code
12	T5-Large	89.9	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
13	PSQ (Chen et al., 2020)	89.9	No	A Statistical Framework for Low-bitwidth Trainin...	2020-10-27	Code
14	UnitedSynT5 (335M)	89.8	Yes	First Train to Generate, then Generate to Train:...	2024-12-12	-
15	ERNIE 2.0 Large	88.7	No	ERNIE 2.0: A Continual Pre-training Framework fo...	2019-07-29	Code
16	SpanBERT	88.1	No	SpanBERT: Improving Pre-training by Representing...	2019-07-24	Code
17	BERT-Large	88	No	FNet: Mixing Tokens with Fourier Transforms	2021-05-09	Code
18	ASA + RoBERTa	88	No	Adversarial Self-Attention for Language Understa...	2022-06-25	Code
19	MT-DNN-ensemble	87.9	No	Improving Multi-Task Deep Neural Networks via Kn...	2019-04-20	Code
20	Q-BERT (Shen et al., 2020)	87.8	No	Q-BERT: Hessian Based Ultra Low Precision Quanti...	2019-09-12	-
21	Snorkel MeTaL (ensemble)	87.6	No	Training Complex Models with Multi-Task Weak Sup...	2018-10-05	Code
22	BigBird	87.5	No	Big Bird: Transformers for Longer Sequences	2020-07-28	Code
23	T5-Base	87.1	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
24	MT-DNN	86.7	No	Multi-Task Deep Neural Networks for Natural Lang...	2019-01-31	Code
25	BERT-LARGE	86.7	No	BERT: Pre-training of Deep Bidirectional Transfo...	2018-10-11	Code
26	RealFormer	86.28	No	RealFormer: Transformer Likes Residual Attention	2020-12-21	Code
27	gMLP-large	86.2	No	Pay Attention to MLPs	2021-05-17	Code
28	ERNIE 2.0 Base	86.1	No	ERNIE 2.0: A Continual Pre-training Framework fo...	2019-07-29	Code
29	Q8BERT (Zafrir et al., 2019)	85.6	No	Q8BERT: Quantized 8Bit BERT	2019-10-14	Code
30	ASA + BERT-base	85	No	Adversarial Self-Attention for Language Understa...	2022-06-25	Code
31	TinyBERT-6 67M	84.6	No	TinyBERT: Distilling BERT for Natural Language U...	2019-09-23	Code
32	ELC-BERT-base 98M (zero init)	84.4	No	Not all layers are equally as important: Every L...	2023-11-03	-
33	24hBERT	84.4	No	How to Train BERT with an Academic Budget	2021-04-15	Code
34	ERNIE	84	No	ERNIE: Enhanced Language Representation with Inf...	2019-05-17	Code
35	Charformer-Tall	83.7	No	Charformer: Fast Character Transformers via Grad...	2021-06-23	Code
36	LTG-BERT-base 98M	83	No	Not all layers are equally as important: Every L...	2023-11-03	-
37	TinyBERT-4 14.5M	82.5	No	TinyBERT: Distilling BERT for Natural Language U...	2019-09-23	Code
38	T5-Small	82.4	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
39	MFAE	82.31	No	-	-	Code
40	Finetuned Transformer LM	82.1	No	-	-	-
41	Finetuned Transformer LM	82.1	No	-	-	Code
42	SqueezeBERT	82	No	SqueezeBERT: What can computer vision teach NLP ...	2020-06-19	Code
43	GPST(unsupervised generative syntactic LM)	81.8	No	Generative Pretrained Structured Transformers: U...	2024-03-13	Code
44	ELC-BERT-small 24M	79.2	No	Not all layers are equally as important: Every L...	2023-11-03	-
45	LTG-BERT-small 24M	78	No	Not all layers are equally as important: Every L...	2023-11-03	-
46	FNet-Large	78	No	FNet: Mixing Tokens with Fourier Transforms	2021-05-09	Code
47	aESIM	73.9	No	Attention Boosted Sequential Inference Model	2018-12-05	-
48	T5-Large 738M	72.4	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
49	Multi-task BiLSTM + Attn	72.2	No	GLUE: A Multi-Task Benchmark and Analysis Platfo...	2018-04-20	Code
50	Stacked Bi-LSTMs (shortcut connections, max-pooling)	71.4	No	Combining Similarity Features and Deep Represent...	2018-11-02	Code
51	GenSen	71.4	No	Learning General Purpose Distributed Sentence Re...	2018-03-30	Code
52	Bi-LSTM sentence encoder (max-pooling)	70.7	No	Combining Similarity Features and Deep Represent...	2018-11-02	Code
53	Stacked Bi-LSTMs (shortcut connections, max-pooling, attention)	70.7	No	Combining Similarity Features and Deep Represent...	2018-11-02	Code
54	SWEM-max	68.2	No	Baseline Needs More Love: On Simple Word-Embeddi...	2018-05-24	Code
55	LaMini-GPT 1.5B	67.5	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
56	LaMini-F-T5 783M	61.4	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
57	LaMini-T5 738M	54.7	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
58	GPT-2-XL 1.5B	36.5	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code