Common Sense Reasoning on WinoGrande

Metric: Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

#	Model↕	Accuracy▼	Extra Data	Paper	Date↕	Code
1	ST-MoE-32B 269B (fine-tuned)	96.1	No	ST-MoE: Designing Stable and Transferable Sparse...	2022-02-17	Code
2	Unicorn 11B (fine-tuned)	91.3	No	UNICORN on RAINBOW: A Universal Commonsense Reas...	2021-03-24	Code
3	CompassMTL 567M with Tailor	90.5	No	Task Compass: Scaling Multi-task Pre-training wi...	2022-10-12	Code
4	CompassMTL 567M	89.6	No	Task Compass: Scaling Multi-task Pre-training wi...	2022-10-12	Code
5	UnifiedQA 11B (fine-tuned)	89.4	No	UnifiedQA: Crossing Format Boundaries With a Sin...	2020-05-02	Code
6	Claude 3 Opus (5-shot)	88.5	No	-	-	-
7	GPT-4 (5-shot)	87.5	No	GPT-4 Technical Report	2023-03-15	Code
8	ExDeBERTa 567M	87	No	Task Compass: Scaling Multi-task Pre-training wi...	2022-10-12	Code
9	LLaMA-2 13B + MixLoRA	86.3	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
10	LLaMA3 8B+MoSLoRA	85.8	No	Mixture-of-Subspaces in Low-Rank Adaptation	2024-06-16	Code
11	PaLM 2-L (1-shot)	83	No	PaLM 2 Technical Report	2023-05-17	Code
12	LLaMA-3 8B + MixLoRA	82.1	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
13	ST-MoE-L 4.1B (fine-tuned)	81.7	No	ST-MoE: Designing Stable and Transferable Sparse...	2022-02-17	Code
14	GPT-3.5 (5-shot)	81.6	No	GPT-4 Technical Report	2023-03-15	Code
15	PaLM 540B (0-shot)	81.1	No	PaLM: Scaling Language Modeling with Pathways	2022-04-05	Code
16	Camelidae-8×34B	80.9	No	Parameter-Efficient Sparsity Crafting from Dense...	2024-01-05	Code
17	PaLM 2-M (1-shot)	79.2	No	PaLM 2 Technical Report	2023-05-17	Code
18	RoBERTa-Winogrande 355M (fine-tuned)	79.1	No	WinoGrande: An Adversarial Winograd Schema Chall...	2019-07-24	Code
19	PaLM 2-S (1-shot)	77.9	No	PaLM 2 Technical Report	2023-05-17	Code
20	Mixtral 8x7B (0-shot)	77.2	No	Mixtral of Experts	2024-01-08	Code
21	PaLM 62B (0-shot)	77	No	PaLM: Scaling Language Modeling with Pathways	2022-04-05	Code
22	PaLM-cont 62B (0-shot)	77	No	PaLM: Scaling Language Modeling with Pathways	2022-04-05	Code
23	LLaMA 65B (0-shot)	77	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
24	LLaMA-2 7B + MixLoRA	76.8	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
25	LLaMA 33B (0-shot)	76	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
26	Mistral 7B (0-shot)	75.3	No	Mistral 7B	2023-10-10	Code
27	Claude 3 Sonnet (5-shot)	75.1	No	-	-	-
28	Chinchilla 70B (0-shot)	74.9	No	Training Compute-Optimal Large Language Models	2022-03-29	Code
29	Claude 3 Haiku (5-shot)	74.2	No	-	-	-
30	Mistral 7B (0-shot)	74.2	No	Mixtral of Experts	2024-01-08	Code
31	phi-1.5-web 1.3B (zero-shot)	74	No	Textbooks Are All You Need II: phi-1.5 technical...	2023-09-11	Code
32	Unified QA 406M (fine-tuned)	73.3	No	UnifiedQA: Crossing Format Boundaries With a Sin...	2020-05-02	Code
33	LLaMA 13B (0-shot)	73	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
34	FLAN 137B (few-shot, k=16)	72.8	No	Finetuned Language Models Are Zero-Shot Learners	2021-09-03	Code
35	G-DAUG-Combo + RoBERTa-Large	71.4	No	Generative Data Augmentation for Commonsense Rea...	2020-04-24	Code
36	FLAN 137B (0-shot)	71.2	No	Finetuned Language Models Are Zero-Shot Learners	2021-09-03	Code
37	RWKV v5 Eagle 7B	70.8	No	-	-	-
38	Branch-Train-MiX 4x7B (sampling top-1 expert)	70.6	No	Branch-Train-MiX: Mixing Expert LLMs into a Mixt...	2024-03-12	Code
39	GPT-3 175B (0-shot)	70.2	No	Language Models are Few-Shot Learners	2020-05-28	Code
40	Gopher 280B (0-shot)	70.1	No	Scaling Language Models: Methods, Analysis & Ins...	2021-12-08	Code
41	LLaMA 7B (0-shot)	70.1	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
42	BLOOM 176B (1-shot)	67	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
43	Pythia 12B (5-shot)	66.6	No	Pythia: A Suite for Analyzing Large Language Mod...	2023-04-03	Code
44	OPT 66B (1-shot)	66.1	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
45	BERT-Winogrande 345M (fine-tuned)	64.9	No	WinoGrande: An Adversarial Winograd Schema Chall...	2019-07-24	Code
46	Bloomberg GPT (one-shot)	64.1	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
47	Pythia 12B (0-shot)	63.9	No	Pythia: A Suite for Analyzing Large Language Mod...	2023-04-03	Code
48	RoE-3B	61.6	No	Exploring the Benefits of Training Expert Langua...	2023-02-07	Code
49	Pythia 6.9B (0-shot)	60.9	No	Pythia: A Suite for Analyzing Large Language Mod...	2023-04-03	Code
50	GPT-NeoX (one-shot)	60.6	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
51	FLAN-T5-Large 783M	59.9	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
52	Pythia 2.8B (0-shot)	59.4	No	Pythia: A Suite for Analyzing Large Language Mod...	2023-04-03	Code
53	RoBERTa-DPR 355M (0-shot)	58.9	No	WinoGrande: An Adversarial Winograd Schema Chall...	2019-07-24	Code
54	ALBERT-xxlarge 235M	58.7	No	Back to Square One: Artifact Detection, Training...	2021-04-16	-
55	Flipped-3B	58.56	No	Guess the Instruction! Flipped Learning Makes La...	2022-10-06	Code
56	GPT-2-XL 1.5B	58.3	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
57	T0-3B (CoT fine-tuned)	57.5	No	The CoT Collection: Improving Zero-shot and Few-...	2023-05-23	Code
58	GPT-3 Large 760M (0-shot)	57.4	No	Language Models are Few-Shot Learners	2020-05-28	Code
59	RoBERTa-base 125M	56.3	No	Back to Square One: Artifact Detection, Training...	2021-04-16	-
60	LaMini-F-T5 783M	56	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
61	LaMini-GPT 1.5B	56	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
62	BERT-large 345M	55.6	No	Back to Square One: Artifact Detection, Training...	2021-04-16	-
63	KiC-770M	55.3	No	Knowledge-in-Context: Towards Knowledgeable Semi...	2022-10-28	-
64	T5-Large 738M	55.2	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
65	LaMini-T5 738M	54.9	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
66	RoBERTa-large 355M	54.9	No	Back to Square One: Artifact Detection, Training...	2021-04-16	-
67	sMLP – deterministic 9.4B (0-shot)	54.3	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
68	Switch Transformer 9B (0-shot)	53.4	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
69	BERT-base 110M	53.1	No	Back to Square One: Artifact Detection, Training...	2021-04-16	-
70	ALBERT-base 11M	52.8	No	Back to Square One: Artifact Detection, Training...	2021-04-16	-
71	BERT-large 345M (0-shot)	51.9	No	WinoGrande: An Adversarial Winograd Schema Chall...	2019-07-24	Code
72	HASH Layers 10B (0-shot)	51.7	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
73	Gshard 9B (0-shot)	51.1	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
74	Base Layers 10B (0-shot)	51	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
75	BERT-DPR 345M (0-shot)	51	No	WinoGrande: An Adversarial Winograd Schema Chall...	2019-07-24	Code
76	Random baseline	50	No	Back to Square One: Artifact Detection, Training...	2021-04-16	-
77	RoBERTa-large 355M (0-shot)	50	No	WinoGrande: An Adversarial Winograd Schema Chall...	2019-07-24	Code