TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Common Sense Reasoning/ARC (Challenge)

Common Sense Reasoning on ARC (Challenge)

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1GPT-4 (few-shot, k=25)96.4NoGPT-4 Technical Report2023-03-15Code
2PaLM 2 (few-shot, CoT, SC)95.1NoPaLM 2 Technical Report2023-05-17Code
3Shivaay (4B, few-shot, k=8)91.04No---
4StupidLLM91.03No---
5Claude 2 (few-shot, k=5)91No---
6Claude 1.3 (few-shot, k=5)90No---
7PaLM 540B (Self Improvement, Self Consistency)89.8NoLarge Language Models Can Self-Improve2022-10-20-
8PaLM 540B (Self Consistency)88.7NoLarge Language Models Can Self-Improve2022-10-20-
9PaLM 540B (Self Improvement, CoT Prompting)88.3NoLarge Language Models Can Self-Improve2022-10-20-
10PaLM 540B (Self Improvement, Standard-Prompting)87.2NoLarge Language Models Can Self-Improve2022-10-20-
11PaLM 540B (Standard-Prompting)87.1NoLarge Language Models Can Self-Improve2022-10-20-
12ST-MoE-32B 269B (fine-tuned)86.5NoST-MoE: Designing Stable and Transferable Sparse...2022-02-17Code
13Claude Instant 1.1 (few-shot, k=5)85.7No---
14GPT-3.5 (few-shot, k=25)85.2NoGPT-4 Technical Report2023-03-15Code
15PaLM 540B (CoT Prompting)85.2NoLarge Language Models Can Self-Improve2022-10-20-
16LLaMA 3 8B + MoSLoRA (fine-tuned)81.5NoMixture-of-Subspaces in Low-Rank Adaptation2024-06-16Code
17LLaMA-3 8B + MixLoRA79.9NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
18LLaMA-2 13B + MixLoRA69.9NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
19PaLM 2-L (1-shot)69.2NoPaLM 2 Technical Report2023-05-17Code
20GAL 120B (zero-shot)67.9YesGalactica: A Large Language Model for Science2022-11-16Code
21Camelidae-8×34B65.2NoParameter-Efficient Sparsity Crafting from Dense...2024-01-05Code
22PaLM 2-M (1-shot)64.9NoPaLM 2 Technical Report2023-05-17Code
23FLAN 137B (few-shot, k=13)63.8NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
24FLAN 137B (zero-shot)63.1NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
25PaLM 2-S (1-shot)59.6NoPaLM 2 Technical Report2023-05-17Code
26LLaMA-2 7B + MixLoRA58.1NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
27LLaMA 33B (zero-shot)57.8NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
28ST-MoE-L 4.1B (fine-tuned)56.9NoST-MoE: Designing Stable and Transferable Sparse...2022-02-17Code
29LLaMA 65B (zero-shot)56YesLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
30Mistral 7B (0-shot)55.5NoMistral 7B2023-10-10Code
31GPT-3 175B (1 shot)53.2YesLanguage Models are Few-Shot Learners2020-05-28Code
32LLaMA 13B (zero-shot)52.7NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
33GPT-3 (zero-shot)51.4NoGalactica: A Large Language Model for Science2022-11-16Code
34GPT-3 175B (0-shot)51.4NoLanguage Models are Few-Shot Learners2020-05-28Code
35BLOOM 176B (1-shot)50.85NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
36GLaM 64B/64E (0 shot)50.3YesGLaM: Efficient Scaling of Language Models with ...2021-12-13-
37UL2 20B (chain-of-thought + self-consistency)49.5NoUL2: Unifying Language Learning Paradigms2022-05-10Code
38Bloomberg GPT 50B (1-shot)48.63NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
39GLaM 64B/64E (1 shot)48.2YesGLaM: Efficient Scaling of Language Models with ...2021-12-13-
40LLaMA 7B (zero-shot)47.6NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
41GPT-NeoX 20B (1-shot)45.39NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
42phi-1.5-web 1.3B (zero-shot)44.9NoTextbooks Are All You Need II: phi-1.5 technical...2023-09-11Code
43OPT 66B (one-shot)44.54NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
44OPT-175B43.94NoSparseGPT: Massive Language Models Can Be Accura...2023-01-02Code
45UL2 20B (chain-of-thought)42.9NoUL2: Unifying Language Learning Paradigms2022-05-10Code
46SparseGPT (175B, 50% Sparsity)41.3NoSparseGPT: Massive Language Models Can Be Accura...2023-01-02Code
47SparseGPT (175B, 4:8 Sparsity)39.85NoSparseGPT: Massive Language Models Can Be Accura...2023-01-02Code
48SparseGPT (175B, 2:4 Sparsity)38.99NoSparseGPT: Massive Language Models Can Be Accura...2023-01-02Code
49Pythia 12B (5-shot)36.8NoPythia: A Suite for Analyzing Large Language Mod...2023-04-03Code
50BLOOM (few-shot, k=5)32.9NoGalactica: A Large Language Model for Science2022-11-16Code
51Pythia 12B (0-shot)31.8NoPythia: A Suite for Analyzing Large Language Mod...2023-04-03Code
52OPT (few-shot, k=5)31.1NoGalactica: A Large Language Model for Science2022-11-16Code
53UL2 20B (zero-shot)29.8NoUL2: Unifying Language Learning Paradigms2022-05-10Code
54OPT-175B (50% Sparsity)25.6NoSparseGPT: Massive Language Models Can Be Accura...2023-01-02Code