TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Natural Language Inference/ANLI test

Natural Language Inference on ANLI test

Metric: A2 (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕A2▼Extra DataPaperDate↕Code
1T5-3B (explanation prompting)72.5No---
2PaLM 540B (Self Improvement, Self Consistency)66.5NoLarge Language Models Can Self-Improve2022-10-20-
3PaLM 540B (Self Improvement, CoT Prompting)65.3NoLarge Language Models Can Self-Improve2022-10-20-
4PaLM 540B (Self Improvement, Standard-Prompting)64.8NoLarge Language Models Can Self-Improve2022-10-20-
5PaLM 540B (Self Consistency)64.5NoLarge Language Models Can Self-Improve2022-10-20-
6PaLM 2-L (one-shot)63.4NoPaLM 2 Technical Report2023-05-17Code
7T0-11B (explanation prompting)60.6No---
8PaLM 540B (CoT Prompting)58.9NoLarge Language Models Can Self-Improve2022-10-20-
9PaLM 540B (Standard-Prompting)55.8NoLarge Language Models Can Self-Improve2022-10-20-
10ChatGPT52.6NoA Systematic Study and Comprehensive Evaluation ...2023-05-29Code
11ALUM (RoBERTa-LARGE)52.1YesAdversarial Training for Large Neural Language M...2020-04-20Code
12XLNet (Large)50.9YesXLNet: Generalized Autoregressive Pretraining fo...2019-06-19Code
13InfoBERT (RoBERTa)50.5YesInfoBERT: Improving Robustness of Language Model...2020-10-05Code
14RoBERTa (Large)49.8YesRoBERTa: A Robustly Optimized BERT Pretraining A...2019-07-26Code
15PaLM 2-M (one-shot)49.5NoPaLM 2 Technical Report2023-05-17Code
16PaLM 2-S (one-shot)48.8NoPaLM 2 Technical Report2023-05-17Code
17T0-3B (CoT fine-tuned)37.2NoThe CoT Collection: Improving Zero-shot and Few-...2023-05-23Code
18Flipped-3B37.05NoGuess the Instruction! Flipped Learning Makes La...2022-10-06Code
19KiC-770M35NoKnowledge-in-Context: Towards Knowledgeable Semi...2022-10-28-
20RoE-3B34.64NoExploring the Benefits of Training Expert Langua...2023-02-07Code
21Bloomberg GPT (one-shot)34.4NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
22OPT 66B (one-shot)34.2NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
23GPT-334YesLanguage Models are Few-Shot Learners2020-05-28Code
24BLOOM 176B (one-shot)33.8NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
25GPT-NeoX (one-shot)33.8NoBloombergGPT: A Large Language Model for Finance2023-03-30Code