Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Question Answering
/
SIQA
Question Answering on SIQA
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Accuracy (best first)
Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
Unicorn 11B (fine-tuned)
83.2
No
UNICORN on RAINBOW: A Universal Commonsense Reas...
2021-03-24
Code
2
LLaMA-2 13B + MixLoRA
82.5
No
MixLoRA: Enhancing Large Language Models Fine-Tu...
2024-04-22
Code
3
CompassMTL 567M with Tailor
82.2
No
Task Compass: Scaling Multi-task Pre-training wi...
2022-10-12
Code
4
CompassMTL 567M
81.7
No
Task Compass: Scaling Multi-task Pre-training wi...
2022-10-12
Code
5
LLaMA-3 8B+MoSLoRA (fine-tuned)
81
No
Mixture-of-Subspaces in Low-Rank Adaptation
2024-06-16
Code
6
DeBERTa-Large 304M
80.2
No
Two is Better than Many? Binary Classification a...
2022-10-29
Code
7
DeBERTa-Large 304M (classification-based)
79.9
No
Two is Better than Many? Binary Classification a...
2022-10-29
Code
8
UnifiedQA 3B
79.8
No
UnifiedQA: Crossing Format Boundaries With a Sin...
2020-05-02
Code
9
ExDeBERTa 567M
79.6
No
Task Compass: Scaling Multi-task Pre-training wi...
2022-10-12
Code
10
LLaMA-3 8B + MixLoRA
78.8
No
MixLoRA: Enhancing Large Language Models Fine-Tu...
2024-04-22
Code
11
LLaMA-2 7B + MixLoRA
78
No
MixLoRA: Enhancing Large Language Models Fine-Tu...
2024-04-22
Code
12
RoBERTa-Large 355M (fine-tuned)
76.7
No
RoBERTa: A Robustly Optimized BERT Pretraining A...
2019-07-26
Code
13
BERT-large 340M (fine-tuned)
64.5
No
SocialIQA: Commonsense Reasoning about Social In...
2019-04-22
Code
14
BERT-base 110M (fine-tuned)
63.1
No
SocialIQA: Commonsense Reasoning about Social In...
2019-04-22
Code
15
GPT-1 117M (fine-tuned)
63
No
SocialIQA: Commonsense Reasoning about Social In...
2019-04-22
Code
16
phi-1.5-web 1.3B (zero-shot)
53
No
Textbooks Are All You Need II: phi-1.5 technical...
2023-09-11
Code
17
phi-1.5 1.3B (zero-shot)
52.6
No
Textbooks Are All You Need II: phi-1.5 technical...
2023-09-11
Code
18
LLaMA 65B (zero-shot)
52.3
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
19
Chinchilla (zero-shot)
51.3
No
Training Compute-Optimal Large Language Models
2022-03-29
Code
20
Gopher (zero-shot)
50.6
No
Scaling Language Models: Methods, Analysis & Ins...
2021-12-08
Code
21
LLaMA 13B (zero-shot)
50.4
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
22
LLaMA 33B (zero-shot)
50.4
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
23
LLaMA 7B (zero-shot)
48.9
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
24
Random chance baseline
33.3
No
SocialIQA: Commonsense Reasoning about Social In...
2019-04-22
Code
#1
Unicorn 11B (fine-tuned)
SOTA
83.2
Accuracy
· 2021-03-24
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
Code
#2
LLaMA-2 13B + MixLoRA
82.5
Accuracy
· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Code
#3
CompassMTL 567M with Tailor
82.2
Accuracy
· 2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix
Code
#4
CompassMTL 567M
81.7
Accuracy
· 2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix
Code
#5
LLaMA-3 8B+MoSLoRA (fine-tuned)
81
Accuracy
· 2024-06-16
Mixture-of-Subspaces in Low-Rank Adaptation
Code
#6
DeBERTa-Large 304M
80.2
Accuracy
· 2022-10-29
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering
Code
#7
DeBERTa-Large 304M (classification-based)
79.9
Accuracy
· 2022-10-29
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering
Code
#8
UnifiedQA 3B
SOTA
79.8
Accuracy
· 2020-05-02
UnifiedQA: Crossing Format Boundaries With a Single QA System
Code
#9
ExDeBERTa 567M
79.6
Accuracy
· 2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix
Code
#10
LLaMA-3 8B + MixLoRA
78.8
Accuracy
· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Code
#11
LLaMA-2 7B + MixLoRA
78
Accuracy
· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Code
#12
RoBERTa-Large 355M (fine-tuned)
SOTA
76.7
Accuracy
· 2019-07-26
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Code
#13
BERT-large 340M (fine-tuned)
SOTA
64.5
Accuracy
· 2019-04-22
SocialIQA: Commonsense Reasoning about Social Interactions
Code
#14
BERT-base 110M (fine-tuned)
63.1
Accuracy
· 2019-04-22
SocialIQA: Commonsense Reasoning about Social Interactions
Code
#15
GPT-1 117M (fine-tuned)
63
Accuracy
· 2019-04-22
SocialIQA: Commonsense Reasoning about Social Interactions
Code
#16
phi-1.5-web 1.3B (zero-shot)
53
Accuracy
· 2023-09-11
Textbooks Are All You Need II: phi-1.5 technical report
Code
#17
phi-1.5 1.3B (zero-shot)
52.6
Accuracy
· 2023-09-11
Textbooks Are All You Need II: phi-1.5 technical report
Code
#18
LLaMA 65B (zero-shot)
52.3
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#19
Chinchilla (zero-shot)
51.3
Accuracy
· 2022-03-29
Training Compute-Optimal Large Language Models
Code
#20
Gopher (zero-shot)
50.6
Accuracy
· 2021-12-08
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Code
#21
LLaMA 13B (zero-shot)
50.4
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#22
LLaMA 33B (zero-shot)
50.4
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#23
LLaMA 7B (zero-shot)
48.9
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#24
Random chance baseline
33.3
Accuracy
· 2019-04-22
SocialIQA: Commonsense Reasoning about Social Interactions
Code