TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Natural Language Inference/MultiNLI

Natural Language Inference on MultiNLI

Metric: Matched (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Matched▼Extra DataPaperDate↕Code
1Turing NLR v5 XXL 5.4B (fine-tuned)92.6No---
2UnitedSynT5 (3B)92.6YesFirst Train to Generate, then Generate to Train:...2024-12-12-
3T592NoSMART: Robust and Efficient Fine-Tuning for Pre-...2019-11-08Code
4T5-XXL 11B (fine-tuned)92NoExploring the Limits of Transfer Learning with a...2019-10-23Code
5T5-3B91.4NoExploring the Limits of Transfer Learning with a...2019-10-23Code
6ALBERT91.3NoALBERT: A Lite BERT for Self-supervised Learning...2019-09-26Code
7DeBERTa (large)91.1NoDeBERTa: Decoding-enhanced BERT with Disentangle...2020-06-05Code
8Adv-RoBERTa ensemble91.1NoStructBERT: Incorporating Language Structures in...2019-08-13-
9RoBERTa90.8NoRoBERTa: A Robustly Optimized BERT Pretraining A...2019-07-26Code
10XLNet (single model)90.8NoXLNet: Generalized Autoregressive Pretraining fo...2019-06-19Code
11RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)90.2NoLLM.int8(): 8-bit Matrix Multiplication for Tran...2022-08-15Code
12T5-Large89.9NoExploring the Limits of Transfer Learning with a...2019-10-23Code
13PSQ (Chen et al., 2020)89.9NoA Statistical Framework for Low-bitwidth Trainin...2020-10-27Code
14UnitedSynT5 (335M)89.8YesFirst Train to Generate, then Generate to Train:...2024-12-12-
15ERNIE 2.0 Large88.7NoERNIE 2.0: A Continual Pre-training Framework fo...2019-07-29Code
16SpanBERT88.1NoSpanBERT: Improving Pre-training by Representing...2019-07-24Code
17BERT-Large88NoFNet: Mixing Tokens with Fourier Transforms2021-05-09Code
18ASA + RoBERTa88NoAdversarial Self-Attention for Language Understa...2022-06-25Code
19MT-DNN-ensemble87.9NoImproving Multi-Task Deep Neural Networks via Kn...2019-04-20Code
20Q-BERT (Shen et al., 2020)87.8NoQ-BERT: Hessian Based Ultra Low Precision Quanti...2019-09-12-
21Snorkel MeTaL (ensemble)87.6NoTraining Complex Models with Multi-Task Weak Sup...2018-10-05Code
22BigBird87.5NoBig Bird: Transformers for Longer Sequences2020-07-28Code
23T5-Base87.1NoExploring the Limits of Transfer Learning with a...2019-10-23Code
24MT-DNN86.7NoMulti-Task Deep Neural Networks for Natural Lang...2019-01-31Code
25BERT-LARGE86.7NoBERT: Pre-training of Deep Bidirectional Transfo...2018-10-11Code
26RealFormer86.28NoRealFormer: Transformer Likes Residual Attention2020-12-21Code
27gMLP-large86.2NoPay Attention to MLPs2021-05-17Code
28ERNIE 2.0 Base86.1NoERNIE 2.0: A Continual Pre-training Framework fo...2019-07-29Code
29Q8BERT (Zafrir et al., 2019)85.6NoQ8BERT: Quantized 8Bit BERT2019-10-14Code
30ASA + BERT-base85NoAdversarial Self-Attention for Language Understa...2022-06-25Code
31TinyBERT-6 67M84.6NoTinyBERT: Distilling BERT for Natural Language U...2019-09-23Code
32ELC-BERT-base 98M (zero init)84.4NoNot all layers are equally as important: Every L...2023-11-03-
3324hBERT84.4NoHow to Train BERT with an Academic Budget2021-04-15Code
34ERNIE84NoERNIE: Enhanced Language Representation with Inf...2019-05-17Code
35Charformer-Tall83.7NoCharformer: Fast Character Transformers via Grad...2021-06-23Code
36LTG-BERT-base 98M83NoNot all layers are equally as important: Every L...2023-11-03-
37TinyBERT-4 14.5M82.5NoTinyBERT: Distilling BERT for Natural Language U...2019-09-23Code
38T5-Small82.4NoExploring the Limits of Transfer Learning with a...2019-10-23Code
39MFAE82.31No--Code
40Finetuned Transformer LM82.1No---
41Finetuned Transformer LM82.1No--Code
42SqueezeBERT82NoSqueezeBERT: What can computer vision teach NLP ...2020-06-19Code
43GPST(unsupervised generative syntactic LM)81.8NoGenerative Pretrained Structured Transformers: U...2024-03-13Code
44ELC-BERT-small 24M79.2NoNot all layers are equally as important: Every L...2023-11-03-
45LTG-BERT-small 24M78NoNot all layers are equally as important: Every L...2023-11-03-
46FNet-Large78NoFNet: Mixing Tokens with Fourier Transforms2021-05-09Code
47aESIM73.9NoAttention Boosted Sequential Inference Model2018-12-05-
48T5-Large 738M72.4NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
49Multi-task BiLSTM + Attn72.2NoGLUE: A Multi-Task Benchmark and Analysis Platfo...2018-04-20Code
50Stacked Bi-LSTMs (shortcut connections, max-pooling)71.4NoCombining Similarity Features and Deep Represent...2018-11-02Code
51GenSen71.4NoLearning General Purpose Distributed Sentence Re...2018-03-30Code
52Bi-LSTM sentence encoder (max-pooling)70.7NoCombining Similarity Features and Deep Represent...2018-11-02Code
53Stacked Bi-LSTMs (shortcut connections, max-pooling, attention)70.7NoCombining Similarity Features and Deep Represent...2018-11-02Code
54SWEM-max68.2NoBaseline Needs More Love: On Simple Word-Embeddi...2018-05-24Code
55LaMini-GPT 1.5B67.5NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
56LaMini-F-T5 783M61.4NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
57LaMini-T5 738M54.7NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
58GPT-2-XL 1.5B36.5NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code