Question Answering on SIQA

Metric: Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	Accuracy▼	Extra Data	Paper	Date↕	Code
1	Unicorn 11B (fine-tuned)	83.2	No	UNICORN on RAINBOW: A Universal Commonsense Reas...	2021-03-24	Code
2	LLaMA-2 13B + MixLoRA	82.5	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
3	CompassMTL 567M with Tailor	82.2	No	Task Compass: Scaling Multi-task Pre-training wi...	2022-10-12	Code
4	CompassMTL 567M	81.7	No	Task Compass: Scaling Multi-task Pre-training wi...	2022-10-12	Code
5	LLaMA-3 8B+MoSLoRA (fine-tuned)	81	No	Mixture-of-Subspaces in Low-Rank Adaptation	2024-06-16	Code
6	DeBERTa-Large 304M	80.2	No	Two is Better than Many? Binary Classification a...	2022-10-29	Code
7	DeBERTa-Large 304M (classification-based)	79.9	No	Two is Better than Many? Binary Classification a...	2022-10-29	Code
8	UnifiedQA 3B	79.8	No	UnifiedQA: Crossing Format Boundaries With a Sin...	2020-05-02	Code
9	ExDeBERTa 567M	79.6	No	Task Compass: Scaling Multi-task Pre-training wi...	2022-10-12	Code
10	LLaMA-3 8B + MixLoRA	78.8	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
11	LLaMA-2 7B + MixLoRA	78	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
12	RoBERTa-Large 355M (fine-tuned)	76.7	No	RoBERTa: A Robustly Optimized BERT Pretraining A...	2019-07-26	Code
13	BERT-large 340M (fine-tuned)	64.5	No	SocialIQA: Commonsense Reasoning about Social In...	2019-04-22	Code
14	BERT-base 110M (fine-tuned)	63.1	No	SocialIQA: Commonsense Reasoning about Social In...	2019-04-22	Code
15	GPT-1 117M (fine-tuned)	63	No	SocialIQA: Commonsense Reasoning about Social In...	2019-04-22	Code
16	phi-1.5-web 1.3B (zero-shot)	53	No	Textbooks Are All You Need II: phi-1.5 technical...	2023-09-11	Code
17	phi-1.5 1.3B (zero-shot)	52.6	No	Textbooks Are All You Need II: phi-1.5 technical...	2023-09-11	Code
18	LLaMA 65B (zero-shot)	52.3	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
19	Chinchilla (zero-shot)	51.3	No	Training Compute-Optimal Large Language Models	2022-03-29	Code
20	Gopher (zero-shot)	50.6	No	Scaling Language Models: Methods, Analysis & Ins...	2021-12-08	Code
21	LLaMA 13B (zero-shot)	50.4	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
22	LLaMA 33B (zero-shot)	50.4	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
23	LLaMA 7B (zero-shot)	48.9	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
24	Random chance baseline	33.3	No	SocialIQA: Commonsense Reasoning about Social In...	2019-04-22	Code

#1Unicorn 11B (fine-tuned)SOTA
83.2
Accuracy· 2021-03-24
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark Code
#2LLaMA-2 13B + MixLoRA
82.5
Accuracy· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts Code
#3CompassMTL 567M with Tailor
82.2
Accuracy· 2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix Code
#4CompassMTL 567M
81.7
Accuracy· 2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix Code
#5LLaMA-3 8B+MoSLoRA (fine-tuned)
81
Accuracy· 2024-06-16
Mixture-of-Subspaces in Low-Rank Adaptation Code
#6DeBERTa-Large 304M
80.2
Accuracy· 2022-10-29
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering Code
#7DeBERTa-Large 304M (classification-based)
79.9
Accuracy· 2022-10-29
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering Code
#8UnifiedQA 3BSOTA
79.8
Accuracy· 2020-05-02
UnifiedQA: Crossing Format Boundaries With a Single QA System Code
#9ExDeBERTa 567M
79.6
Accuracy· 2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix Code
#10LLaMA-3 8B + MixLoRA
78.8
Accuracy· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts Code
#11LLaMA-2 7B + MixLoRA
78
Accuracy· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts Code
#12RoBERTa-Large 355M (fine-tuned)SOTA
76.7
Accuracy· 2019-07-26
RoBERTa: A Robustly Optimized BERT Pretraining Approach Code
#13BERT-large 340M (fine-tuned)SOTA
64.5
Accuracy· 2019-04-22
SocialIQA: Commonsense Reasoning about Social Interactions Code
#14BERT-base 110M (fine-tuned)
63.1
Accuracy· 2019-04-22
SocialIQA: Commonsense Reasoning about Social Interactions Code
#15GPT-1 117M (fine-tuned)
63
Accuracy· 2019-04-22
SocialIQA: Commonsense Reasoning about Social Interactions Code
#16phi-1.5-web 1.3B (zero-shot)
53
Accuracy· 2023-09-11
Textbooks Are All You Need II: phi-1.5 technical report Code
#17phi-1.5 1.3B (zero-shot)
52.6
Accuracy· 2023-09-11
Textbooks Are All You Need II: phi-1.5 technical report Code
#18LLaMA 65B (zero-shot)
52.3
Accuracy· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#19Chinchilla (zero-shot)
51.3
Accuracy· 2022-03-29
Training Compute-Optimal Large Language Models Code
#20Gopher (zero-shot)
50.6
Accuracy· 2021-12-08
Scaling Language Models: Methods, Analysis & Insights from Training Gopher Code
#21LLaMA 13B (zero-shot)
50.4
Accuracy· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#22LLaMA 33B (zero-shot)
50.4
Accuracy· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#23LLaMA 7B (zero-shot)
48.9
Accuracy· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#24Random chance baseline
33.3
Accuracy· 2019-04-22
SocialIQA: Commonsense Reasoning about Social Interactions Code