Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Question Answering
/
PIQA
Question Answering on PIQA
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Accuracy (best first)
Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
Unicorn 11B (fine-tuned)
90.1
No
UNICORN on RAINBOW: A Universal Commonsense Reas...
2021-03-24
Code
2
LLaMA3 8B+MoSLoRA
89.7
No
Mixture-of-Subspaces in Low-Rank Adaptation
2024-06-16
Code
3
CompassMTL 567M with Tailor
88.3
No
Task Compass: Scaling Multi-task Pre-training wi...
2022-10-12
Code
4
LLaMA-3 8B + MixLoRA
87.6
No
MixLoRA: Enhancing Large Language Models Fine-Tu...
2024-04-22
Code
5
DeBERTa-Large 304M
87.4
No
Two is Better than Many? Binary Classification a...
2022-10-29
Code
6
CompassMTL 567M
87.3
No
Task Compass: Scaling Multi-task Pre-training wi...
2022-10-12
Code
7
LLaMA-2 13B + MixLoRA
86.8
No
MixLoRA: Enhancing Large Language Models Fine-Tu...
2024-04-22
Code
8
Shakti-LLM (2.5B)
86.2
No
SHAKTI: A 2.5 Billion Parameter Small Language M...
2024-10-15
-
9
DeBERTa-Large 304M (classification-based)
85.9
No
Two is Better than Many? Binary Classification a...
2022-10-29
Code
10
ExDeBERTa 567M
85.5
No
Task Compass: Scaling Multi-task Pre-training wi...
2022-10-12
Code
11
UnifiedQA 3B
85.3
No
UnifiedQA: Crossing Format Boundaries With a Sin...
2020-05-02
Code
12
PaLM 2-L (1-shot)
85
No
PaLM 2 Technical Report
2023-05-17
Code
13
Mixtral 8x7B (0-shot)
83.6
No
Mixtral of Experts
2024-01-08
Code
14
PaLM 2-M (1-shot)
83.2
No
PaLM 2 Technical Report
2023-05-17
Code
15
LLaMA-2 7B + MixLoRA
83.2
No
MixLoRA: Enhancing Large Language Models Fine-Tu...
2024-04-22
Code
16
Mistral 7B (0-shot)
83
No
Mistral 7B
2023-10-10
Code
17
LLaMA 65B (0-shot)
82.8
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
18
LLaMA 2 70B (0-shot)
82.8
No
Llama 2: Open Foundation and Fine-Tuned Chat Mod...
2023-07-18
Code
19
Camelidae-8×34B
82.7
No
Parameter-Efficient Sparsity Crafting from Dense...
2024-01-05
Code
20
LLaMA 33B (0-shot)
82.3
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
21
PaLM 2-S (1-shot)
82.2
No
PaLM 2 Technical Report
2023-05-17
Code
22
Mistral 7B (0-shot)
82.2
No
Mixtral of Experts
2024-01-08
Code
23
MT-NLG 530B (0-shot)
82
No
Megatron-LM: Training Multi-Billion Parameter La...
2019-09-17
Code
24
LLaMA 2 34B (0-shot)
81.9
No
Llama 2: Open Foundation and Fine-Tuned Chat Mod...
2023-07-18
Code
25
Gopher 280B (0-shot)
81.8
No
Scaling Language Models: Methods, Analysis & Ins...
2021-12-08
Code
26
Chinchilla 70B (0-shot)
81.8
No
Training Compute-Optimal Large Language Models
2022-03-29
Code
27
FLAN 137B (few-shot, k=10)
81.7
No
Finetuned Language Models Are Zero-Shot Learners
2021-09-03
Code
28
OPT-175B
81.07
No
SparseGPT: Massive Language Models Can Be Accura...
2023-01-02
Code
29
GPT-3 175B (0-shot)
81
No
Language Models are Few-Shot Learners
2020-05-28
Code
30
SparseGPT 175B (50% Sparsity)
80.63
No
SparseGPT: Massive Language Models Can Be Accura...
2023-01-02
Code
31
FLAN 137B (0-shot)
80.5
No
Finetuned Language Models Are Zero-Shot Learners
2021-09-03
Code
32
LLaMA 2 13B (0-shot)
80.5
No
Llama 2: Open Foundation and Fine-Tuned Chat Mod...
2023-07-18
Code
33
LLaMA 13B (0-shot)
80.1
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
34
LLaMA 7B (0-shot)
79.8
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
35
SparseGPT 175B (4:8 Sparsity)
79.54
No
SparseGPT: Massive Language Models Can Be Accura...
2023-01-02
Code
36
SparseGPT 175B (2:4 Sparsity)
79.54
No
SparseGPT: Massive Language Models Can Be Accura...
2023-01-02
Code
37
RoBERTa-Large 355M
79.4
No
RoBERTa: A Robustly Optimized BERT Pretraining A...
2019-07-26
Code
38
LLaMA 2 7B (0-shot)
78.8
No
Llama 2: Open Foundation and Fine-Tuned Chat Mod...
2023-07-18
Code
39
Bloomberg GPT 50B (1-shot)
77.9
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
40
OPT 66B (1-shot)
77.6
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
41
RoBERTa-large 355M (fine-tuned)
77.1
No
PIQA: Reasoning about Physical Commonsense in Na...
2019-11-26
Code
42
phi-1.5-web (1.3B)
77
No
Textbooks Are All You Need II: phi-1.5 technical...
2023-09-11
Code
43
BLOOM 176B (1-shot)
77
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
44
Pythia 12B (5-shot)
76.7
No
Pythia: A Suite for Analyzing Large Language Mod...
2023-04-03
Code
45
Open-LLaMA-3B-v2
76.2
No
Sheared LLaMA: Accelerating Language Model Pre-t...
2023-10-10
Code
46
Pythia 12B (0-shot)
76
No
Pythia: A Suite for Analyzing Large Language Mod...
2023-04-03
Code
47
Sheared-LLaMA-2.7B
75.8
No
Sheared LLaMA: Accelerating Language Model Pre-t...
2023-10-10
Code
48
GPT-NeoX 20B (1-shot)
75.8
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
49
Pythia 6.9B (0-shot)
75.2
No
Pythia: A Suite for Analyzing Large Language Mod...
2023-04-03
Code
50
Sheared-LLaMA-1.3B
73.4
No
Sheared LLaMA: Accelerating Language Model Pre-t...
2023-10-10
Code
51
sMLP - deterministic 9.4B (0-shot)
73
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
52
GPT-3 Large 760M (0-shot)
72.9
No
Language Models are Few-Shot Learners
2020-05-28
Code
53
FLAN-T5-Large 783M
72.2
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
54
LaMini-GPT 1.5B
71.3
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
55
LaMini-F-T5 783M
70.6
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
56
GPT-2-XL 1.5B
70.5
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
57
Pythia 1B (5-shot)
70.4
No
Pythia: A Suite for Analyzing Large Language Mod...
2023-04-03
Code
58
GPT-2-small 124M (fine-tuned)
69.2
No
PIQA: Reasoning about Physical Commonsense in Na...
2019-11-26
Code
59
Gshard 9B
68.1
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
60
LaMini-T5 738M
67.2
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
61
BERT-large 340M (fine-tuned)
66.8
No
PIQA: Reasoning about Physical Commonsense in Na...
2019-11-26
Code
62
BERT-Large 340M
66.7
No
BERT: Pre-training of Deep Bidirectional Transfo...
2018-10-11
Code
63
Base Layers 10B (0-shot)
63.8
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
64
HASH Layers 10B (0-shot)
63.8
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
65
T5-Large 738M
55.9
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
66
OPT-175B (50% Sparsity)
54.73
No
SparseGPT: Massive Language Models Can Be Accura...
2023-01-02
Code
67
Random chance baseline
50
No
Back to Square One: Artifact Detection, Training...
2021-04-16
Code
#1
Unicorn 11B (fine-tuned)
SOTA
90.1
Accuracy
· 2021-03-24
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
Code
#2
LLaMA3 8B+MoSLoRA
89.7
Accuracy
· 2024-06-16
Mixture-of-Subspaces in Low-Rank Adaptation
Code
#3
CompassMTL 567M with Tailor
88.3
Accuracy
· 2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix
Code
#4
LLaMA-3 8B + MixLoRA
87.6
Accuracy
· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Code
#5
DeBERTa-Large 304M
87.4
Accuracy
· 2022-10-29
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering
Code
#6
CompassMTL 567M
87.3
Accuracy
· 2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix
Code
#7
LLaMA-2 13B + MixLoRA
86.8
Accuracy
· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Code
#8
Shakti-LLM (2.5B)
86.2
Accuracy
· 2024-10-15
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments
#9
DeBERTa-Large 304M (classification-based)
85.9
Accuracy
· 2022-10-29
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering
Code
#10
ExDeBERTa 567M
85.5
Accuracy
· 2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix
Code
#11
UnifiedQA 3B
SOTA
85.3
Accuracy
· 2020-05-02
UnifiedQA: Crossing Format Boundaries With a Single QA System
Code
#12
PaLM 2-L (1-shot)
85
Accuracy
· 2023-05-17
PaLM 2 Technical Report
Code
#13
Mixtral 8x7B (0-shot)
83.6
Accuracy
· 2024-01-08
Mixtral of Experts
Code
#14
PaLM 2-M (1-shot)
83.2
Accuracy
· 2023-05-17
PaLM 2 Technical Report
Code
#15
LLaMA-2 7B + MixLoRA
83.2
Accuracy
· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Code
#16
Mistral 7B (0-shot)
83
Accuracy
· 2023-10-10
Mistral 7B
Code
#17
LLaMA 65B (0-shot)
82.8
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#18
LLaMA 2 70B (0-shot)
82.8
Accuracy
· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models
Code
#19
Camelidae-8×34B
82.7
Accuracy
· 2024-01-05
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
Code
#20
LLaMA 33B (0-shot)
82.3
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#21
PaLM 2-S (1-shot)
82.2
Accuracy
· 2023-05-17
PaLM 2 Technical Report
Code
#22
Mistral 7B (0-shot)
82.2
Accuracy
· 2024-01-08
Mixtral of Experts
Code
#23
MT-NLG 530B (0-shot)
SOTA
82
Accuracy
· 2019-09-17
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Code
#24
LLaMA 2 34B (0-shot)
81.9
Accuracy
· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models
Code
#25
Gopher 280B (0-shot)
81.8
Accuracy
· 2021-12-08
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Code
#26
Chinchilla 70B (0-shot)
81.8
Accuracy
· 2022-03-29
Training Compute-Optimal Large Language Models
Code
#27
FLAN 137B (few-shot, k=10)
81.7
Accuracy
· 2021-09-03
Finetuned Language Models Are Zero-Shot Learners
Code
#28
OPT-175B
81.07
Accuracy
· 2023-01-02
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Code
#29
GPT-3 175B (0-shot)
81
Accuracy
· 2020-05-28
Language Models are Few-Shot Learners
Code
#30
SparseGPT 175B (50% Sparsity)
80.63
Accuracy
· 2023-01-02
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Code
#31
FLAN 137B (0-shot)
80.5
Accuracy
· 2021-09-03
Finetuned Language Models Are Zero-Shot Learners
Code
#32
LLaMA 2 13B (0-shot)
80.5
Accuracy
· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models
Code
#33
LLaMA 13B (0-shot)
80.1
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#34
LLaMA 7B (0-shot)
79.8
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#35
SparseGPT 175B (4:8 Sparsity)
79.54
Accuracy
· 2023-01-02
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Code
#36
SparseGPT 175B (2:4 Sparsity)
79.54
Accuracy
· 2023-01-02
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Code
#37
RoBERTa-Large 355M
SOTA
79.4
Accuracy
· 2019-07-26
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Code
#38
LLaMA 2 7B (0-shot)
78.8
Accuracy
· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models
Code
#39
Bloomberg GPT 50B (1-shot)
77.9
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#40
OPT 66B (1-shot)
77.6
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#41
RoBERTa-large 355M (fine-tuned)
77.1
Accuracy
· 2019-11-26
PIQA: Reasoning about Physical Commonsense in Natural Language
Code
#42
phi-1.5-web (1.3B)
77
Accuracy
· 2023-09-11
Textbooks Are All You Need II: phi-1.5 technical report
Code
#43
BLOOM 176B (1-shot)
77
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#44
Pythia 12B (5-shot)
76.7
Accuracy
· 2023-04-03
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Code
#45
Open-LLaMA-3B-v2
76.2
Accuracy
· 2023-10-10
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Code
#46
Pythia 12B (0-shot)
76
Accuracy
· 2023-04-03
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Code
#47
Sheared-LLaMA-2.7B
75.8
Accuracy
· 2023-10-10
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Code
#48
GPT-NeoX 20B (1-shot)
75.8
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#49
Pythia 6.9B (0-shot)
75.2
Accuracy
· 2023-04-03
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Code
#50
Sheared-LLaMA-1.3B
73.4
Accuracy
· 2023-10-10
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Code
#51
sMLP - deterministic 9.4B (0-shot)
73
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#52
GPT-3 Large 760M (0-shot)
72.9
Accuracy
· 2020-05-28
Language Models are Few-Shot Learners
Code
#53
FLAN-T5-Large 783M
72.2
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#54
LaMini-GPT 1.5B
71.3
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#55
LaMini-F-T5 783M
70.6
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#56
GPT-2-XL 1.5B
70.5
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#57
Pythia 1B (5-shot)
70.4
Accuracy
· 2023-04-03
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Code
#58
GPT-2-small 124M (fine-tuned)
69.2
Accuracy
· 2019-11-26
PIQA: Reasoning about Physical Commonsense in Natural Language
Code
#59
Gshard 9B
68.1
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#60
LaMini-T5 738M
67.2
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#61
BERT-large 340M (fine-tuned)
66.8
Accuracy
· 2019-11-26
PIQA: Reasoning about Physical Commonsense in Natural Language
Code
#62
BERT-Large 340M
SOTA
66.7
Accuracy
· 2018-10-11
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Code
#63
Base Layers 10B (0-shot)
63.8
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#64
HASH Layers 10B (0-shot)
63.8
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#65
T5-Large 738M
55.9
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#66
OPT-175B (50% Sparsity)
54.73
Accuracy
· 2023-01-02
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Code
#67
Random chance baseline
50
Accuracy
· 2021-04-16
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema
Code