Common Sense Reasoning on ARC (Easy)

Metric: Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

#	Model↕	Accuracy▼	Extra Data	Paper	Date↕	Code
1	ST-MoE-32B 269B (fine-tuned)	95.2	No	ST-MoE: Designing Stable and Transferable Sparse...	2022-02-17	Code
2	LLaMA 3 8B+MoSLoRA (fine-tuned)	90.5	No	Mixture-of-Subspaces in Low-Rank Adaptation	2024-06-16	Code
3	PaLM 2-L (1-shot)	89.7	No	PaLM 2 Technical Report	2023-05-17	Code
4	PaLM 2-M (1-shot)	88	No	PaLM 2 Technical Report	2023-05-17	Code
5	LLaMA-3 8B + MixLoRA	86.5	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
6	Camelidae-8×34B	86.2	No	Parameter-Efficient Sparsity Crafting from Dense...	2024-01-05	Code
7	PaLM 2-S (1-shot)	85.6	No	PaLM 2 Technical Report	2023-05-17	Code
8	LLaMA 65B + CFG (0-shot)	84.2	No	Stay on topic with Classifier-Free Guidance	2023-06-30	-
9	GAL 120B (0-shot)	83.8	No	Galactica: A Large Language Model for Science	2022-11-16	Code
10	LLaMA-2 13B + MixLoRA	83.5	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
11	LLaMA 30B + CFG (0-shot)	83.2	No	Stay on topic with Classifier-Free Guidance	2023-06-30	-
12	Mixtral 8x7B (0-shot)	83.1	No	Mixtral of Experts	2024-01-08	Code
13	FLAN 137B (few-shot, k=14)	80.7	No	Finetuned Language Models Are Zero-Shot Learners	2021-09-03	Code
14	Mistral 7B (0-shot)	80.5	No	Mixtral of Experts	2024-01-08	Code
15	LLaMA 33B (0-shot)	80	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
16	Mistral 7B (0-shot)	80	No	Mistral 7B	2023-10-10	Code
17	FLAN 137B (0-shot)	79.6	No	Finetuned Language Models Are Zero-Shot Learners	2021-09-03	Code
18	LLaMA 13B + CFG (0-shot)	79.1	No	Stay on topic with Classifier-Free Guidance	2023-06-30	-
19	LLaMA 65B (0-shot)	78.9	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
20	LLaMA-2 7B + MixLoRA	77.7	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
21	phi-1.5-web 1.3B (0-shot)	76.1	No	Textbooks Are All You Need II: phi-1.5 technical...	2023-09-11	Code
22	BLOOM 176B (1-shot)	75.93	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
23	ST-MoE-L 4.1B (fine-tuned)	75.4	No	ST-MoE: Designing Stable and Transferable Sparse...	2022-02-17	Code
24	GLaM (64B/64E) (5-shot)	74.8	No	GLaM: Efficient Scaling of Language Models with ...	2021-12-13	-
25	LLaMA 13B (0-shot)	74.8	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
26	Bloomberg GPT 50B (1-shot)	73.99	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
27	LLaMA 7B (0-shot)	72.8	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
28	Pythia 12B (5-shot)	71.5	No	Pythia: A Suite for Analyzing Large Language Mod...	2023-04-03	Code
29	OPT 66B (1-shot)	71.25	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
30	GPT-3 175B (1 shot)	71.2	No	Language Models are Few-Shot Learners	2020-05-28	Code
31	OPT-175B	71.04	No	SparseGPT: Massive Language Models Can Be Accura...	2023-01-02	Code
32	GPT-NeoX 20B (1-shot)	70.79	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
33	Pythia 12B (0-shot)	70.2	No	Pythia: A Suite for Analyzing Large Language Mod...	2023-04-03	Code
34	UL2 20B (chain-of-thought + self-consistency)	69.8	No	UL2: Unifying Language Learning Paradigms	2022-05-10	Code
35	Mamba-2.8B (0-shot)	69.7	No	Mamba: Linear-Time Sequence Modeling with Select...	2023-12-01	Code
36	SparseGPT 175B (50% sparsity)	69.65	No	SparseGPT: Massive Language Models Can Be Accura...	2023-01-02	Code
37	GPT-3 (zero-shot)	68.8	No	Galactica: A Large Language Model for Science	2022-11-16	Code
38	GPT-3 175B (0-shot)	68.8	No	Language Models are Few-Shot Learners	2020-05-28	Code
39	SparseGPT (175B, 4:8 Sparsity)	68.35	No	SparseGPT: Massive Language Models Can Be Accura...	2023-01-02	Code
40	GLaM 64B/64E (0-shot)	68	No	GLaM: Efficient Scaling of Language Models with ...	2021-12-13	-
41	SparseGPT 175B (2:4 sparsity)	67.08	No	SparseGPT: Massive Language Models Can Be Accura...	2023-01-02	Code
42	LLaMA 7B + CFG (0-shot)	58.9	No	Stay on topic with Classifier-Free Guidance	2023-06-30	-
43	BLOOM (5-shot)	40.7	No	Galactica: A Large Language Model for Science	2022-11-16	Code
44	UL2 20B (chain-of-thought)	38.4	No	UL2: Unifying Language Learning Paradigms	2022-05-10	Code
45	OPT (5-shot)	37.4	No	Galactica: A Large Language Model for Science	2022-11-16	Code
46	UL2 20B (0-shot)	32.2	No	UL2: Unifying Language Learning Paradigms	2022-05-10	Code
47	OPT 175B (50% Sparsity)	28.03	No	SparseGPT: Massive Language Models Can Be Accura...	2023-01-02	Code