Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Common Sense Reasoning
/
CommonsenseQA
Common Sense Reasoning on CommonsenseQA
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
GPT-4o (HPT)
92.54
No
Hierarchical Prompting Taxonomy: A Universal Eva...
2024-06-18
Code
2
DeBERTaV3-large+KEAR
91.2
Yes
Human Parity on CommonsenseQA: Augmenting Self-A...
2021-12-06
Code
3
PaLM 2 (few‑shot, CoT, SC)
90.4
Yes
PaLM 2 Technical Report
2023-05-17
Code
4
KEAR
89.4
Yes
Human Parity on CommonsenseQA: Augmenting Self-A...
2021-12-06
Code
5
DEKCOR
83.3
Yes
Fusing Context Into Knowledge Graph for Commonse...
2020-12-09
Code
6
Unicorn 11B (fine-tuned)
79.3
No
UNICORN on RAINBOW: A Universal Commonsense Reas...
2021-03-24
Code
7
MUPPET Roberta Large
79.2
Yes
Muppet: Massive Multi-task Representations with ...
2021-01-26
Code
8
UnifiedQA 11B (fine-tuned)
79.1
Yes
UnifiedQA: Crossing Format Boundaries With a Sin...
2020-05-02
Code
9
DRAGON
78.2
No
Deep Bidirectional Language-Knowledge Graph Pret...
2022-10-17
Code
10
T5-XXL 11B (fine-tuned)
78.1
No
UnifiedQA: Crossing Format Boundaries With a Sin...
2020-05-02
Code
11
Albert Lan et al. (2020) (ensemble)
76.5
No
ALBERT: A Lite BERT for Self-supervised Learning...
2019-09-26
Code
12
UnifiedQA 11B (zero-shot)
76.2
No
UnifiedQA: Crossing Format Boundaries With a Sin...
2020-05-02
Code
13
QA-GNN
76.1
No
QA-GNN: Reasoning with Language Models and Knowl...
2021-04-13
Code
14
XLNet+GraphReason
75.3
No
Graph-Based Reasoning over Heterogeneous Externa...
2019-09-09
Code
15
GrapeQA: PEGA
73.5
No
GrapeQA: GRaph Augmentation and Pruning to Enhan...
2023-03-22
-
16
RoBERTa+HyKAS Ma et al. (2019)
73.2
No
Towards Generalizable Neuro-Symbolic Systems for...
2019-10-30
-
17
GPT-3 Direct Finetuned
73
No
Human Parity on CommonsenseQA: Augmenting Self-A...
2021-12-06
Code
18
STaR (on GPT-J)
72.3
No
STaR: Bootstrapping Reasoning With Reasoning
2022-03-28
Code
19
RoBERTa-Large 355M
72.1
No
RoBERTa: A Robustly Optimized BERT Pretraining A...
2019-07-26
Code
20
STaR without Rationalization (on GPT-J)
68.8
No
STaR: Bootstrapping Reasoning With Reasoning
2022-03-28
Code
21
OPT 66B (1-shot)
66.4
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
22
Bloomberg GPT 50B (1-shot)
65.5
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
23
CAGE-reasoning
64.7
No
Explain Yourself! Leveraging Language Models for...
2019-06-06
Code
24
BLOOM 176B (1-shot)
64.2
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
25
UnifiedQA 440M (fine-tuned)
64
No
UnifiedQA: Crossing Format Boundaries With a Sin...
2020-05-02
Code
26
BART-large 440M (fine-tuned)
62.5
No
UnifiedQA: Crossing Format Boundaries With a Sin...
2020-05-02
Code
27
BERT_CSlarge
62.2
No
Align, Mask and Select: A Simple Method for Inco...
2019-08-19
-
28
GPT-NeoX 20B (1-shot)
60.4
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
29
GPT-J Direct Finetuned
60
No
STaR: Bootstrapping Reasoning With Reasoning
2022-03-28
Code
30
KagNet
58.9
Yes
KagNet: Knowledge-Aware Graph Networks for Commo...
2019-09-04
Code
31
BERT-LARGE
55.9
Yes
CommonsenseQA: A Question Answering Challenge Ta...
2018-11-02
Code
32
UL2 20B (chain-of-thought + self-consistency)
55.7
No
UL2: Unifying Language Learning Paradigms
2022-05-10
Code
33
Few-shot CoT LaMDA 137B
55.6
No
STaR: Bootstrapping Reasoning With Reasoning
2022-03-28
Code
34
UL2 20B (chain-of-thought)
51.4
No
UL2: Unifying Language Learning Paradigms
2022-05-10
Code
35
Few-shot CoT GPT-J
36.6
No
STaR: Bootstrapping Reasoning With Reasoning
2022-03-28
Code
36
UL2 20B (zero-shot)
34.2
No
UL2: Unifying Language Learning Paradigms
2022-05-10
Code
37
Chain of thought ASDiv
28.6
No
Chain-of-Thought Prompting Elicits Reasoning in ...
2022-01-28
Code
38
Few-shot Direct GPT-J
20.9
No
STaR: Bootstrapping Reasoning With Reasoning
2022-03-28
Code