TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Natural Language Inference/MultiNLI

Natural Language Inference on MultiNLI

Metric: Mismatched (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Mismatched▼Extra DataPaperDate↕Code
1Turing NLR v5 XXL 5.4B (fine-tuned)92.4No---
2T591.7NoSMART: Robust and Efficient Fine-Tuning for Pre-...2019-11-08Code
3T5-11B91.7NoExploring the Limits of Transfer Learning with a...2019-10-23Code
4T5-3B91.2NoExploring the Limits of Transfer Learning with a...2019-10-23Code
5DeBERTa (large)91.1NoDeBERTa: Decoding-enhanced BERT with Disentangle...2020-06-05Code
6Adv-RoBERTa ensemble90.7NoStructBERT: Incorporating Language Structures in...2019-08-13-
7RoBERTa (ensemble)90.2NoRoBERTa: A Robustly Optimized BERT Pretraining A...2019-07-26Code
8T5-Large 770M89.6NoExploring the Limits of Transfer Learning with a...2019-10-23Code
9ERNIE 2.0 Large88.8NoERNIE 2.0: A Continual Pre-training Framework fo...2019-07-29Code
10BERT-Large88NoFNet: Mixing Tokens with Fourier Transforms2021-05-09Code
11MT-DNN-ensemble87.4NoImproving Multi-Task Deep Neural Networks via Kn...2019-04-20Code
12Snorkel MeTaL (ensemble)87.2NoTraining Complex Models with Multi-Task Weak Sup...2018-10-05Code
13gMLP-large86.5NoPay Attention to MLPs2021-05-17Code
14RealFormer86.34NoRealFormer: Transformer Likes Residual Attention2020-12-21Code
15T5-Base86.2NoExploring the Limits of Transfer Learning with a...2019-10-23Code
16MT-DNN86NoMulti-Task Deep Neural Networks for Natural Lang...2019-01-31Code
17BERT-LARGE85.9NoBERT: Pre-training of Deep Bidirectional Transfo...2018-10-11Code
18ERNIE 2.0 Base85.5NoERNIE 2.0: A Continual Pre-training Framework fo...2019-07-29Code
19ELC-BERT-base 98M (zero init)84.5NoNot all layers are equally as important: Every L...2023-11-03-
20Charformer-Tall84.4NoCharformer: Fast Character Transformers via Grad...2021-06-23Code
2124hBERT83.8NoHow to Train BERT with an Academic Budget2021-04-15Code
22LTG-BERT-base 98M83.4NoNot all layers are equally as important: Every L...2023-11-03-
23TinyBERT-6 67M83.2NoTinyBERT: Distilling BERT for Natural Language U...2019-09-23Code
24ERNIE83.2NoERNIE: Enhanced Language Representation with Inf...2019-05-17Code
25T5-Small82.3NoExploring the Limits of Transfer Learning with a...2019-10-23Code
26GPST(unsupervised generative syntactic LM)82NoGenerative Pretrained Structured Transformers: U...2024-03-13Code
27TinyBERT-4 14.5M81.8NoTinyBERT: Distilling BERT for Natural Language U...2019-09-23Code
28MFAE81.43No--Code
29Finetuned Transformer LM81.4No---
30Finetuned Transformer LM81.4No--Code
31SqueezeBERT81.1NoSqueezeBERT: What can computer vision teach NLP ...2020-06-19Code
32ELC-BERT-small 24M79.9NoNot all layers are equally as important: Every L...2023-11-03-
33LTG-BERT-small 24M78.8NoNot all layers are equally as important: Every L...2023-11-03-
34FNet-Large76NoFNet: Mixing Tokens with Fourier Transforms2021-05-09Code
35aESIM73.9NoAttention Boosted Sequential Inference Model2018-12-05-
36Stacked Bi-LSTMs (shortcut connections, max-pooling)72.2NoCombining Similarity Features and Deep Represent...2018-11-02Code
37Multi-task BiLSTM + Attn72.1NoGLUE: A Multi-Task Benchmark and Analysis Platfo...2018-04-20Code
38T5-Large 738M72NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
39GenSen71.3NoLearning General Purpose Distributed Sentence Re...2018-03-30Code
40Bi-LSTM sentence encoder (max-pooling)71.1NoCombining Similarity Features and Deep Represent...2018-11-02Code
41Stacked Bi-LSTMs (shortcut connections, max-pooling, attention)70.5NoCombining Similarity Features and Deep Represent...2018-11-02Code
42LaMini-GPT 1.5B69.3NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
43SWEM-max67.7NoBaseline Needs More Love: On Simple Word-Embeddi...2018-05-24Code
44LaMini-F-T5 783M61NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
45LaMini-T5 738M55.8NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
46GPT-2-XL 1.5B37NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code