Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Common Sense Reasoning
/
CommonsenseQA
Common Sense Reasoning on CommonsenseQA
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Accuracy (best first)
Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
GPT-4o (HPT)
92.54
No
Hierarchical Prompting Taxonomy: A Universal Eva...
2024-06-18
Code
2
DeBERTaV3-large+KEAR
91.2
Yes
Human Parity on CommonsenseQA: Augmenting Self-A...
2021-12-06
Code
3
PaLM 2 (few‑shot, CoT, SC)
90.4
Yes
PaLM 2 Technical Report
2023-05-17
Code
4
KEAR
89.4
Yes
Human Parity on CommonsenseQA: Augmenting Self-A...
2021-12-06
Code
5
DEKCOR
83.3
Yes
Fusing Context Into Knowledge Graph for Commonse...
2020-12-09
Code
6
Unicorn 11B (fine-tuned)
79.3
No
UNICORN on RAINBOW: A Universal Commonsense Reas...
2021-03-24
Code
7
MUPPET Roberta Large
79.2
Yes
Muppet: Massive Multi-task Representations with ...
2021-01-26
Code
8
UnifiedQA 11B (fine-tuned)
79.1
Yes
UnifiedQA: Crossing Format Boundaries With a Sin...
2020-05-02
Code
9
DRAGON
78.2
No
Deep Bidirectional Language-Knowledge Graph Pret...
2022-10-17
Code
10
T5-XXL 11B (fine-tuned)
78.1
No
UnifiedQA: Crossing Format Boundaries With a Sin...
2020-05-02
Code
11
Albert Lan et al. (2020) (ensemble)
76.5
No
ALBERT: A Lite BERT for Self-supervised Learning...
2019-09-26
Code
12
UnifiedQA 11B (zero-shot)
76.2
No
UnifiedQA: Crossing Format Boundaries With a Sin...
2020-05-02
Code
13
QA-GNN
76.1
No
QA-GNN: Reasoning with Language Models and Knowl...
2021-04-13
Code
14
XLNet+GraphReason
75.3
No
Graph-Based Reasoning over Heterogeneous Externa...
2019-09-09
Code
15
GrapeQA: PEGA
73.5
No
GrapeQA: GRaph Augmentation and Pruning to Enhan...
2023-03-22
-
16
RoBERTa+HyKAS Ma et al. (2019)
73.2
No
Towards Generalizable Neuro-Symbolic Systems for...
2019-10-30
-
17
GPT-3 Direct Finetuned
73
No
Human Parity on CommonsenseQA: Augmenting Self-A...
2021-12-06
Code
18
STaR (on GPT-J)
72.3
No
STaR: Bootstrapping Reasoning With Reasoning
2022-03-28
Code
19
RoBERTa-Large 355M
72.1
No
RoBERTa: A Robustly Optimized BERT Pretraining A...
2019-07-26
Code
20
STaR without Rationalization (on GPT-J)
68.8
No
STaR: Bootstrapping Reasoning With Reasoning
2022-03-28
Code
21
OPT 66B (1-shot)
66.4
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
22
Bloomberg GPT 50B (1-shot)
65.5
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
23
CAGE-reasoning
64.7
No
Explain Yourself! Leveraging Language Models for...
2019-06-06
Code
24
BLOOM 176B (1-shot)
64.2
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
25
UnifiedQA 440M (fine-tuned)
64
No
UnifiedQA: Crossing Format Boundaries With a Sin...
2020-05-02
Code
26
BART-large 440M (fine-tuned)
62.5
No
UnifiedQA: Crossing Format Boundaries With a Sin...
2020-05-02
Code
27
BERT_CSlarge
62.2
No
Align, Mask and Select: A Simple Method for Inco...
2019-08-19
-
28
GPT-NeoX 20B (1-shot)
60.4
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
29
GPT-J Direct Finetuned
60
No
STaR: Bootstrapping Reasoning With Reasoning
2022-03-28
Code
30
KagNet
58.9
Yes
KagNet: Knowledge-Aware Graph Networks for Commo...
2019-09-04
Code
31
BERT-LARGE
55.9
Yes
CommonsenseQA: A Question Answering Challenge Ta...
2018-11-02
Code
32
UL2 20B (chain-of-thought + self-consistency)
55.7
No
UL2: Unifying Language Learning Paradigms
2022-05-10
Code
33
Few-shot CoT LaMDA 137B
55.6
No
STaR: Bootstrapping Reasoning With Reasoning
2022-03-28
Code
34
UL2 20B (chain-of-thought)
51.4
No
UL2: Unifying Language Learning Paradigms
2022-05-10
Code
35
Few-shot CoT GPT-J
36.6
No
STaR: Bootstrapping Reasoning With Reasoning
2022-03-28
Code
36
UL2 20B (zero-shot)
34.2
No
UL2: Unifying Language Learning Paradigms
2022-05-10
Code
37
Chain of thought ASDiv
28.6
No
Chain-of-Thought Prompting Elicits Reasoning in ...
2022-01-28
Code
38
Few-shot Direct GPT-J
20.9
No
STaR: Bootstrapping Reasoning With Reasoning
2022-03-28
Code
#1
GPT-4o (HPT)
SOTA
92.54
Accuracy
· 2024-06-18
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles
Code
#2
DeBERTaV3-large+KEAR
SOTA
91.2
Accuracy
· Extra Data
· 2021-12-06
Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention
Code
#3
PaLM 2 (few‑shot, CoT, SC)
90.4
Accuracy
· Extra Data
· 2023-05-17
PaLM 2 Technical Report
Code
#4
KEAR
89.4
Accuracy
· Extra Data
· 2021-12-06
Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention
Code
#5
DEKCOR
SOTA
83.3
Accuracy
· Extra Data
· 2020-12-09
Fusing Context Into Knowledge Graph for Commonsense Question Answering
Code
#6
Unicorn 11B (fine-tuned)
79.3
Accuracy
· 2021-03-24
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
Code
#7
MUPPET Roberta Large
79.2
Accuracy
· Extra Data
· 2021-01-26
Muppet: Massive Multi-task Representations with Pre-Finetuning
Code
#8
UnifiedQA 11B (fine-tuned)
SOTA
79.1
Accuracy
· Extra Data
· 2020-05-02
UnifiedQA: Crossing Format Boundaries With a Single QA System
Code
#9
DRAGON
78.2
Accuracy
· 2022-10-17
Deep Bidirectional Language-Knowledge Graph Pretraining
Code
#10
T5-XXL 11B (fine-tuned)
78.1
Accuracy
· 2020-05-02
UnifiedQA: Crossing Format Boundaries With a Single QA System
Code
#11
Albert Lan et al. (2020) (ensemble)
SOTA
76.5
Accuracy
· 2019-09-26
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Code
#12
UnifiedQA 11B (zero-shot)
76.2
Accuracy
· 2020-05-02
UnifiedQA: Crossing Format Boundaries With a Single QA System
Code
#13
QA-GNN
76.1
Accuracy
· 2021-04-13
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering
Code
#14
XLNet+GraphReason
SOTA
75.3
Accuracy
· 2019-09-09
Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering
Code
#15
GrapeQA: PEGA
73.5
Accuracy
· 2023-03-22
GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering
#16
RoBERTa+HyKAS Ma et al. (2019)
73.2
Accuracy
· 2019-10-30
Towards Generalizable Neuro-Symbolic Systems for Commonsense Question Answering
#17
GPT-3 Direct Finetuned
73
Accuracy
· 2021-12-06
Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention
Code
#18
STaR (on GPT-J)
72.3
Accuracy
· 2022-03-28
STaR: Bootstrapping Reasoning With Reasoning
Code
#19
RoBERTa-Large 355M
SOTA
72.1
Accuracy
· 2019-07-26
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Code
#20
STaR without Rationalization (on GPT-J)
68.8
Accuracy
· 2022-03-28
STaR: Bootstrapping Reasoning With Reasoning
Code
#21
OPT 66B (1-shot)
66.4
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#22
Bloomberg GPT 50B (1-shot)
65.5
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#23
CAGE-reasoning
SOTA
64.7
Accuracy
· 2019-06-06
Explain Yourself! Leveraging Language Models for Commonsense Reasoning
Code
#24
BLOOM 176B (1-shot)
64.2
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#25
UnifiedQA 440M (fine-tuned)
64
Accuracy
· 2020-05-02
UnifiedQA: Crossing Format Boundaries With a Single QA System
Code
#26
BART-large 440M (fine-tuned)
62.5
Accuracy
· 2020-05-02
UnifiedQA: Crossing Format Boundaries With a Single QA System
Code
#27
BERT_CSlarge
62.2
Accuracy
· 2019-08-19
Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models
#28
GPT-NeoX 20B (1-shot)
60.4
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#29
GPT-J Direct Finetuned
60
Accuracy
· 2022-03-28
STaR: Bootstrapping Reasoning With Reasoning
Code
#30
KagNet
58.9
Accuracy
· Extra Data
· 2019-09-04
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
Code
#31
BERT-LARGE
SOTA
55.9
Accuracy
· Extra Data
· 2018-11-02
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
Code
#32
UL2 20B (chain-of-thought + self-consistency)
55.7
Accuracy
· 2022-05-10
UL2: Unifying Language Learning Paradigms
Code
#33
Few-shot CoT LaMDA 137B
55.6
Accuracy
· 2022-03-28
STaR: Bootstrapping Reasoning With Reasoning
Code
#34
UL2 20B (chain-of-thought)
51.4
Accuracy
· 2022-05-10
UL2: Unifying Language Learning Paradigms
Code
#35
Few-shot CoT GPT-J
36.6
Accuracy
· 2022-03-28
STaR: Bootstrapping Reasoning With Reasoning
Code
#36
UL2 20B (zero-shot)
34.2
Accuracy
· 2022-05-10
UL2: Unifying Language Learning Paradigms
Code
#37
Chain of thought ASDiv
28.6
Accuracy
· 2022-01-28
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Code
#38
Few-shot Direct GPT-J
20.9
Accuracy
· 2022-03-28
STaR: Bootstrapping Reasoning With Reasoning
Code