Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Common Sense Reasoning
/
WinoGrande
Common Sense Reasoning on WinoGrande
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Accuracy (best first)
Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
ST-MoE-32B 269B (fine-tuned)
96.1
No
ST-MoE: Designing Stable and Transferable Sparse...
2022-02-17
Code
2
Unicorn 11B (fine-tuned)
91.3
No
UNICORN on RAINBOW: A Universal Commonsense Reas...
2021-03-24
Code
3
CompassMTL 567M with Tailor
90.5
No
Task Compass: Scaling Multi-task Pre-training wi...
2022-10-12
Code
4
CompassMTL 567M
89.6
No
Task Compass: Scaling Multi-task Pre-training wi...
2022-10-12
Code
5
UnifiedQA 11B (fine-tuned)
89.4
No
UnifiedQA: Crossing Format Boundaries With a Sin...
2020-05-02
Code
6
Claude 3 Opus (5-shot)
88.5
No
-
-
-
7
GPT-4 (5-shot)
87.5
No
GPT-4 Technical Report
2023-03-15
Code
8
ExDeBERTa 567M
87
No
Task Compass: Scaling Multi-task Pre-training wi...
2022-10-12
Code
9
LLaMA-2 13B + MixLoRA
86.3
No
MixLoRA: Enhancing Large Language Models Fine-Tu...
2024-04-22
Code
10
LLaMA3 8B+MoSLoRA
85.8
No
Mixture-of-Subspaces in Low-Rank Adaptation
2024-06-16
Code
11
PaLM 2-L (1-shot)
83
No
PaLM 2 Technical Report
2023-05-17
Code
12
LLaMA-3 8B + MixLoRA
82.1
No
MixLoRA: Enhancing Large Language Models Fine-Tu...
2024-04-22
Code
13
ST-MoE-L 4.1B (fine-tuned)
81.7
No
ST-MoE: Designing Stable and Transferable Sparse...
2022-02-17
Code
14
GPT-3.5 (5-shot)
81.6
No
GPT-4 Technical Report
2023-03-15
Code
15
PaLM 540B (0-shot)
81.1
No
PaLM: Scaling Language Modeling with Pathways
2022-04-05
Code
16
Camelidae-8×34B
80.9
No
Parameter-Efficient Sparsity Crafting from Dense...
2024-01-05
Code
17
PaLM 2-M (1-shot)
79.2
No
PaLM 2 Technical Report
2023-05-17
Code
18
RoBERTa-Winogrande 355M (fine-tuned)
79.1
No
WinoGrande: An Adversarial Winograd Schema Chall...
2019-07-24
Code
19
PaLM 2-S (1-shot)
77.9
No
PaLM 2 Technical Report
2023-05-17
Code
20
Mixtral 8x7B (0-shot)
77.2
No
Mixtral of Experts
2024-01-08
Code
21
PaLM 62B (0-shot)
77
No
PaLM: Scaling Language Modeling with Pathways
2022-04-05
Code
22
PaLM-cont 62B (0-shot)
77
No
PaLM: Scaling Language Modeling with Pathways
2022-04-05
Code
23
LLaMA 65B (0-shot)
77
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
24
LLaMA-2 7B + MixLoRA
76.8
No
MixLoRA: Enhancing Large Language Models Fine-Tu...
2024-04-22
Code
25
LLaMA 33B (0-shot)
76
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
26
Mistral 7B (0-shot)
75.3
No
Mistral 7B
2023-10-10
Code
27
Claude 3 Sonnet (5-shot)
75.1
No
-
-
-
28
Chinchilla 70B (0-shot)
74.9
No
Training Compute-Optimal Large Language Models
2022-03-29
Code
29
Claude 3 Haiku (5-shot)
74.2
No
-
-
-
30
Mistral 7B (0-shot)
74.2
No
Mixtral of Experts
2024-01-08
Code
31
phi-1.5-web 1.3B (zero-shot)
74
No
Textbooks Are All You Need II: phi-1.5 technical...
2023-09-11
Code
32
Unified QA 406M (fine-tuned)
73.3
No
UnifiedQA: Crossing Format Boundaries With a Sin...
2020-05-02
Code
33
LLaMA 13B (0-shot)
73
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
34
FLAN 137B (few-shot, k=16)
72.8
No
Finetuned Language Models Are Zero-Shot Learners
2021-09-03
Code
35
G-DAUG-Combo + RoBERTa-Large
71.4
No
Generative Data Augmentation for Commonsense Rea...
2020-04-24
Code
36
FLAN 137B (0-shot)
71.2
No
Finetuned Language Models Are Zero-Shot Learners
2021-09-03
Code
37
RWKV v5 Eagle 7B
70.8
No
-
-
-
38
Branch-Train-MiX 4x7B (sampling top-1 expert)
70.6
No
Branch-Train-MiX: Mixing Expert LLMs into a Mixt...
2024-03-12
Code
39
GPT-3 175B (0-shot)
70.2
No
Language Models are Few-Shot Learners
2020-05-28
Code
40
Gopher 280B (0-shot)
70.1
No
Scaling Language Models: Methods, Analysis & Ins...
2021-12-08
Code
41
LLaMA 7B (0-shot)
70.1
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
42
BLOOM 176B (1-shot)
67
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
43
Pythia 12B (5-shot)
66.6
No
Pythia: A Suite for Analyzing Large Language Mod...
2023-04-03
Code
44
OPT 66B (1-shot)
66.1
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
45
BERT-Winogrande 345M (fine-tuned)
64.9
No
WinoGrande: An Adversarial Winograd Schema Chall...
2019-07-24
Code
46
Bloomberg GPT (one-shot)
64.1
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
47
Pythia 12B (0-shot)
63.9
No
Pythia: A Suite for Analyzing Large Language Mod...
2023-04-03
Code
48
RoE-3B
61.6
No
Exploring the Benefits of Training Expert Langua...
2023-02-07
Code
49
Pythia 6.9B (0-shot)
60.9
No
Pythia: A Suite for Analyzing Large Language Mod...
2023-04-03
Code
50
GPT-NeoX (one-shot)
60.6
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
51
FLAN-T5-Large 783M
59.9
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
52
Pythia 2.8B (0-shot)
59.4
No
Pythia: A Suite for Analyzing Large Language Mod...
2023-04-03
Code
53
RoBERTa-DPR 355M (0-shot)
58.9
No
WinoGrande: An Adversarial Winograd Schema Chall...
2019-07-24
Code
54
ALBERT-xxlarge 235M
58.7
No
Back to Square One: Artifact Detection, Training...
2021-04-16
-
55
Flipped-3B
58.56
No
Guess the Instruction! Flipped Learning Makes La...
2022-10-06
Code
56
GPT-2-XL 1.5B
58.3
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
57
T0-3B (CoT fine-tuned)
57.5
No
The CoT Collection: Improving Zero-shot and Few-...
2023-05-23
Code
58
GPT-3 Large 760M (0-shot)
57.4
No
Language Models are Few-Shot Learners
2020-05-28
Code
59
RoBERTa-base 125M
56.3
No
Back to Square One: Artifact Detection, Training...
2021-04-16
-
60
LaMini-F-T5 783M
56
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
61
LaMini-GPT 1.5B
56
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
62
BERT-large 345M
55.6
No
Back to Square One: Artifact Detection, Training...
2021-04-16
-
63
KiC-770M
55.3
No
Knowledge-in-Context: Towards Knowledgeable Semi...
2022-10-28
-
64
T5-Large 738M
55.2
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
65
LaMini-T5 738M
54.9
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
66
RoBERTa-large 355M
54.9
No
Back to Square One: Artifact Detection, Training...
2021-04-16
-
67
sMLP – deterministic 9.4B (0-shot)
54.3
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
68
Switch Transformer 9B (0-shot)
53.4
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
69
BERT-base 110M
53.1
No
Back to Square One: Artifact Detection, Training...
2021-04-16
-
70
ALBERT-base 11M
52.8
No
Back to Square One: Artifact Detection, Training...
2021-04-16
-
71
BERT-large 345M (0-shot)
51.9
No
WinoGrande: An Adversarial Winograd Schema Chall...
2019-07-24
Code
72
HASH Layers 10B (0-shot)
51.7
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
73
Gshard 9B (0-shot)
51.1
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
74
Base Layers 10B (0-shot)
51
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
75
BERT-DPR 345M (0-shot)
51
No
WinoGrande: An Adversarial Winograd Schema Chall...
2019-07-24
Code
76
Random baseline
50
No
Back to Square One: Artifact Detection, Training...
2021-04-16
-
77
RoBERTa-large 355M (0-shot)
50
No
WinoGrande: An Adversarial Winograd Schema Chall...
2019-07-24
Code
#1
ST-MoE-32B 269B (fine-tuned)
SOTA
96.1
Accuracy
· 2022-02-17
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Code
#2
Unicorn 11B (fine-tuned)
SOTA
91.3
Accuracy
· 2021-03-24
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
Code
#3
CompassMTL 567M with Tailor
90.5
Accuracy
· 2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix
Code
#4
CompassMTL 567M
89.6
Accuracy
· 2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix
Code
#5
UnifiedQA 11B (fine-tuned)
SOTA
89.4
Accuracy
· 2020-05-02
UnifiedQA: Crossing Format Boundaries With a Single QA System
Code
#6
Claude 3 Opus (5-shot)
88.5
Accuracy
No paper
#7
GPT-4 (5-shot)
87.5
Accuracy
· 2023-03-15
GPT-4 Technical Report
Code
#8
ExDeBERTa 567M
87
Accuracy
· 2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix
Code
#9
LLaMA-2 13B + MixLoRA
86.3
Accuracy
· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Code
#10
LLaMA3 8B+MoSLoRA
85.8
Accuracy
· 2024-06-16
Mixture-of-Subspaces in Low-Rank Adaptation
Code
#11
PaLM 2-L (1-shot)
83
Accuracy
· 2023-05-17
PaLM 2 Technical Report
Code
#12
LLaMA-3 8B + MixLoRA
82.1
Accuracy
· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Code
#13
ST-MoE-L 4.1B (fine-tuned)
81.7
Accuracy
· 2022-02-17
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Code
#14
GPT-3.5 (5-shot)
81.6
Accuracy
· 2023-03-15
GPT-4 Technical Report
Code
#15
PaLM 540B (0-shot)
81.1
Accuracy
· 2022-04-05
PaLM: Scaling Language Modeling with Pathways
Code
#16
Camelidae-8×34B
80.9
Accuracy
· 2024-01-05
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
Code
#17
PaLM 2-M (1-shot)
79.2
Accuracy
· 2023-05-17
PaLM 2 Technical Report
Code
#18
RoBERTa-Winogrande 355M (fine-tuned)
SOTA
79.1
Accuracy
· 2019-07-24
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Code
#19
PaLM 2-S (1-shot)
77.9
Accuracy
· 2023-05-17
PaLM 2 Technical Report
Code
#20
Mixtral 8x7B (0-shot)
77.2
Accuracy
· 2024-01-08
Mixtral of Experts
Code
#21
PaLM 62B (0-shot)
77
Accuracy
· 2022-04-05
PaLM: Scaling Language Modeling with Pathways
Code
#22
PaLM-cont 62B (0-shot)
77
Accuracy
· 2022-04-05
PaLM: Scaling Language Modeling with Pathways
Code
#23
LLaMA 65B (0-shot)
77
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#24
LLaMA-2 7B + MixLoRA
76.8
Accuracy
· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Code
#25
LLaMA 33B (0-shot)
76
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#26
Mistral 7B (0-shot)
75.3
Accuracy
· 2023-10-10
Mistral 7B
Code
#27
Claude 3 Sonnet (5-shot)
75.1
Accuracy
No paper
#28
Chinchilla 70B (0-shot)
74.9
Accuracy
· 2022-03-29
Training Compute-Optimal Large Language Models
Code
#29
Claude 3 Haiku (5-shot)
74.2
Accuracy
No paper
#30
Mistral 7B (0-shot)
74.2
Accuracy
· 2024-01-08
Mixtral of Experts
Code
#31
phi-1.5-web 1.3B (zero-shot)
74
Accuracy
· 2023-09-11
Textbooks Are All You Need II: phi-1.5 technical report
Code
#32
Unified QA 406M (fine-tuned)
73.3
Accuracy
· 2020-05-02
UnifiedQA: Crossing Format Boundaries With a Single QA System
Code
#33
LLaMA 13B (0-shot)
73
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#34
FLAN 137B (few-shot, k=16)
72.8
Accuracy
· 2021-09-03
Finetuned Language Models Are Zero-Shot Learners
Code
#35
G-DAUG-Combo + RoBERTa-Large
71.4
Accuracy
· 2020-04-24
Generative Data Augmentation for Commonsense Reasoning
Code
#36
FLAN 137B (0-shot)
71.2
Accuracy
· 2021-09-03
Finetuned Language Models Are Zero-Shot Learners
Code
#37
RWKV v5 Eagle 7B
70.8
Accuracy
No paper
#38
Branch-Train-MiX 4x7B (sampling top-1 expert)
70.6
Accuracy
· 2024-03-12
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Code
#39
GPT-3 175B (0-shot)
70.2
Accuracy
· 2020-05-28
Language Models are Few-Shot Learners
Code
#40
Gopher 280B (0-shot)
70.1
Accuracy
· 2021-12-08
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Code
#41
LLaMA 7B (0-shot)
70.1
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#42
BLOOM 176B (1-shot)
67
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#43
Pythia 12B (5-shot)
66.6
Accuracy
· 2023-04-03
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Code
#44
OPT 66B (1-shot)
66.1
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#45
BERT-Winogrande 345M (fine-tuned)
64.9
Accuracy
· 2019-07-24
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Code
#46
Bloomberg GPT (one-shot)
64.1
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#47
Pythia 12B (0-shot)
63.9
Accuracy
· 2023-04-03
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Code
#48
RoE-3B
61.6
Accuracy
· 2023-02-07
Exploring the Benefits of Training Expert Language Models over Instruction Tuning
Code
#49
Pythia 6.9B (0-shot)
60.9
Accuracy
· 2023-04-03
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Code
#50
GPT-NeoX (one-shot)
60.6
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#51
FLAN-T5-Large 783M
59.9
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#52
Pythia 2.8B (0-shot)
59.4
Accuracy
· 2023-04-03
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Code
#53
RoBERTa-DPR 355M (0-shot)
58.9
Accuracy
· 2019-07-24
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Code
#54
ALBERT-xxlarge 235M
58.7
Accuracy
· 2021-04-16
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema
#55
Flipped-3B
58.56
Accuracy
· 2022-10-06
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
Code
#56
GPT-2-XL 1.5B
58.3
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#57
T0-3B (CoT fine-tuned)
57.5
Accuracy
· 2023-05-23
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
Code
#58
GPT-3 Large 760M (0-shot)
57.4
Accuracy
· 2020-05-28
Language Models are Few-Shot Learners
Code
#59
RoBERTa-base 125M
56.3
Accuracy
· 2021-04-16
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema
#60
LaMini-F-T5 783M
56
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#61
LaMini-GPT 1.5B
56
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#62
BERT-large 345M
55.6
Accuracy
· 2021-04-16
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema
#63
KiC-770M
55.3
Accuracy
· 2022-10-28
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models
#64
T5-Large 738M
55.2
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#65
LaMini-T5 738M
54.9
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#66
RoBERTa-large 355M
54.9
Accuracy
· 2021-04-16
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema
#67
sMLP – deterministic 9.4B (0-shot)
54.3
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#68
Switch Transformer 9B (0-shot)
53.4
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#69
BERT-base 110M
53.1
Accuracy
· 2021-04-16
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema
#70
ALBERT-base 11M
52.8
Accuracy
· 2021-04-16
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema
#71
BERT-large 345M (0-shot)
51.9
Accuracy
· 2019-07-24
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Code
#72
HASH Layers 10B (0-shot)
51.7
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#73
Gshard 9B (0-shot)
51.1
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#74
Base Layers 10B (0-shot)
51
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#75
BERT-DPR 345M (0-shot)
51
Accuracy
· 2019-07-24
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Code
#76
Random baseline
50
Accuracy
· 2021-04-16
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema
#77
RoBERTa-large 355M (0-shot)
50
Accuracy
· 2019-07-24
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Code