TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Natural Language Inference/ANLI test

Natural Language Inference on ANLI test

Metric: A3 (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕A3▼Extra DataPaperDate↕Code
1T5-3B (explanation prompting)74.8No---
2PaLM 540B (Self Improvement, Self Consistency)67.9NoLarge Language Models Can Self-Improve2022-10-20-
3PaLM 540B (Self Improvement, CoT Prompting)67.3NoLarge Language Models Can Self-Improve2022-10-20-
4PaLM 2-L (one-shot)67.1NoPaLM 2 Technical Report2023-05-17Code
5PaLM 540B (Self Improvement, Standard-Prompting)66.9NoLarge Language Models Can Self-Improve2022-10-20-
6PaLM 540B (Self Consistency)63.4NoLarge Language Models Can Self-Improve2022-10-20-
7PaLM 540B (CoT Prompting)60.6NoLarge Language Models Can Self-Improve2022-10-20-
8T0-11B (explanation prompting)59.9No---
9PaLM 540B (Standard-Prompting)55.8NoLarge Language Models Can Self-Improve2022-10-20-
10PaLM 2-M (one-shot)54.5NoPaLM 2 Technical Report2023-05-17Code
11ChatGPT54.1NoA Systematic Study and Comprehensive Evaluation ...2023-05-29Code
12PaLM 2-S (one-shot)53.2NoPaLM 2 Technical Report2023-05-17Code
13XLNet (Large)49.4YesXLNet: Generalized Autoregressive Pretraining fo...2019-06-19Code
14ALUM (RoBERTa-LARGE)48.4YesAdversarial Training for Large Neural Language M...2020-04-20Code
15InfoBERT (RoBERTa)47.7YesInfoBERT: Improving Robustness of Language Model...2020-10-05Code
16RoBERTa (Large)44.4YesRoBERTa: A Robustly Optimized BERT Pretraining A...2019-07-26Code
17T0-3B (CoT fine-tuned)41.9NoThe CoT Collection: Improving Zero-shot and Few-...2023-05-23Code
18GPT-340.2YesLanguage Models are Few-Shot Learners2020-05-28Code
19Flipped-3B37.73NoGuess the Instruction! Flipped Learning Makes La...2022-10-06Code
20KiC-770M37.6NoKnowledge-in-Context: Towards Knowledgeable Semi...2022-10-28-
21Bloomberg GPT (one-shot)37.33NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
22GPT-NeoX (one-shot)36.17NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
23BLOOM 176B (one-shot)35.17NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
24OPT 66B (one-shot)34.92NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
25RoE-3B31.22NoExploring the Benefits of Training Expert Langua...2023-02-07Code