TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Common Sense Reasoning/ARC (Easy)

Common Sense Reasoning on ARC (Easy)

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1ST-MoE-32B 269B (fine-tuned)95.2NoST-MoE: Designing Stable and Transferable Sparse...2022-02-17Code
2LLaMA 3 8B+MoSLoRA (fine-tuned)90.5NoMixture-of-Subspaces in Low-Rank Adaptation2024-06-16Code
3PaLM 2-L (1-shot)89.7NoPaLM 2 Technical Report2023-05-17Code
4PaLM 2-M (1-shot)88NoPaLM 2 Technical Report2023-05-17Code
5LLaMA-3 8B + MixLoRA86.5NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
6Camelidae-8×34B86.2NoParameter-Efficient Sparsity Crafting from Dense...2024-01-05Code
7PaLM 2-S (1-shot)85.6NoPaLM 2 Technical Report2023-05-17Code
8LLaMA 65B + CFG (0-shot)84.2NoStay on topic with Classifier-Free Guidance2023-06-30-
9GAL 120B (0-shot)83.8NoGalactica: A Large Language Model for Science2022-11-16Code
10LLaMA-2 13B + MixLoRA83.5NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
11LLaMA 30B + CFG (0-shot)83.2NoStay on topic with Classifier-Free Guidance2023-06-30-
12Mixtral 8x7B (0-shot)83.1NoMixtral of Experts2024-01-08Code
13FLAN 137B (few-shot, k=14)80.7NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
14Mistral 7B (0-shot)80.5NoMixtral of Experts2024-01-08Code
15LLaMA 33B (0-shot)80NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
16Mistral 7B (0-shot)80NoMistral 7B2023-10-10Code
17FLAN 137B (0-shot)79.6NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
18LLaMA 13B + CFG (0-shot)79.1NoStay on topic with Classifier-Free Guidance2023-06-30-
19LLaMA 65B (0-shot)78.9NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
20LLaMA-2 7B + MixLoRA77.7NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
21phi-1.5-web 1.3B (0-shot)76.1NoTextbooks Are All You Need II: phi-1.5 technical...2023-09-11Code
22BLOOM 176B (1-shot)75.93NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
23ST-MoE-L 4.1B (fine-tuned)75.4NoST-MoE: Designing Stable and Transferable Sparse...2022-02-17Code
24GLaM (64B/64E) (5-shot)74.8NoGLaM: Efficient Scaling of Language Models with ...2021-12-13-
25LLaMA 13B (0-shot)74.8NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
26Bloomberg GPT 50B (1-shot)73.99NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
27LLaMA 7B (0-shot)72.8NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
28Pythia 12B (5-shot)71.5NoPythia: A Suite for Analyzing Large Language Mod...2023-04-03Code
29OPT 66B (1-shot)71.25NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
30GPT-3 175B (1 shot)71.2NoLanguage Models are Few-Shot Learners2020-05-28Code
31OPT-175B71.04NoSparseGPT: Massive Language Models Can Be Accura...2023-01-02Code
32GPT-NeoX 20B (1-shot)70.79NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
33Pythia 12B (0-shot)70.2NoPythia: A Suite for Analyzing Large Language Mod...2023-04-03Code
34UL2 20B (chain-of-thought + self-consistency)69.8NoUL2: Unifying Language Learning Paradigms2022-05-10Code
35Mamba-2.8B (0-shot)69.7NoMamba: Linear-Time Sequence Modeling with Select...2023-12-01Code
36SparseGPT 175B (50% sparsity)69.65NoSparseGPT: Massive Language Models Can Be Accura...2023-01-02Code
37GPT-3 (zero-shot)68.8NoGalactica: A Large Language Model for Science2022-11-16Code
38GPT-3 175B (0-shot)68.8NoLanguage Models are Few-Shot Learners2020-05-28Code
39SparseGPT (175B, 4:8 Sparsity)68.35NoSparseGPT: Massive Language Models Can Be Accura...2023-01-02Code
40GLaM 64B/64E (0-shot)68NoGLaM: Efficient Scaling of Language Models with ...2021-12-13-
41SparseGPT 175B (2:4 sparsity)67.08NoSparseGPT: Massive Language Models Can Be Accura...2023-01-02Code
42LLaMA 7B + CFG (0-shot)58.9NoStay on topic with Classifier-Free Guidance2023-06-30-
43BLOOM (5-shot)40.7NoGalactica: A Large Language Model for Science2022-11-16Code
44UL2 20B (chain-of-thought)38.4NoUL2: Unifying Language Learning Paradigms2022-05-10Code
45OPT (5-shot)37.4NoGalactica: A Large Language Model for Science2022-11-16Code
46UL2 20B (0-shot)32.2NoUL2: Unifying Language Learning Paradigms2022-05-10Code
47OPT 175B (50% Sparsity)28.03NoSparseGPT: Massive Language Models Can Be Accura...2023-01-02Code