Question Answering on PIQA

Metric: Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

#	Model↕	Accuracy▼	Extra Data	Paper	Date↕	Code
1	Unicorn 11B (fine-tuned)	90.1	No	UNICORN on RAINBOW: A Universal Commonsense Reas...	2021-03-24	Code
2	LLaMA3 8B+MoSLoRA	89.7	No	Mixture-of-Subspaces in Low-Rank Adaptation	2024-06-16	Code
3	CompassMTL 567M with Tailor	88.3	No	Task Compass: Scaling Multi-task Pre-training wi...	2022-10-12	Code
4	LLaMA-3 8B + MixLoRA	87.6	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
5	DeBERTa-Large 304M	87.4	No	Two is Better than Many? Binary Classification a...	2022-10-29	Code
6	CompassMTL 567M	87.3	No	Task Compass: Scaling Multi-task Pre-training wi...	2022-10-12	Code
7	LLaMA-2 13B + MixLoRA	86.8	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
8	Shakti-LLM (2.5B)	86.2	No	SHAKTI: A 2.5 Billion Parameter Small Language M...	2024-10-15	-
9	DeBERTa-Large 304M (classification-based)	85.9	No	Two is Better than Many? Binary Classification a...	2022-10-29	Code
10	ExDeBERTa 567M	85.5	No	Task Compass: Scaling Multi-task Pre-training wi...	2022-10-12	Code
11	UnifiedQA 3B	85.3	No	UnifiedQA: Crossing Format Boundaries With a Sin...	2020-05-02	Code
12	PaLM 2-L (1-shot)	85	No	PaLM 2 Technical Report	2023-05-17	Code
13	Mixtral 8x7B (0-shot)	83.6	No	Mixtral of Experts	2024-01-08	Code
14	PaLM 2-M (1-shot)	83.2	No	PaLM 2 Technical Report	2023-05-17	Code
15	LLaMA-2 7B + MixLoRA	83.2	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
16	Mistral 7B (0-shot)	83	No	Mistral 7B	2023-10-10	Code
17	LLaMA 65B (0-shot)	82.8	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
18	LLaMA 2 70B (0-shot)	82.8	No	Llama 2: Open Foundation and Fine-Tuned Chat Mod...	2023-07-18	Code
19	Camelidae-8×34B	82.7	No	Parameter-Efficient Sparsity Crafting from Dense...	2024-01-05	Code
20	LLaMA 33B (0-shot)	82.3	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
21	PaLM 2-S (1-shot)	82.2	No	PaLM 2 Technical Report	2023-05-17	Code
22	Mistral 7B (0-shot)	82.2	No	Mixtral of Experts	2024-01-08	Code
23	MT-NLG 530B (0-shot)	82	No	Megatron-LM: Training Multi-Billion Parameter La...	2019-09-17	Code
24	LLaMA 2 34B (0-shot)	81.9	No	Llama 2: Open Foundation and Fine-Tuned Chat Mod...	2023-07-18	Code
25	Gopher 280B (0-shot)	81.8	No	Scaling Language Models: Methods, Analysis & Ins...	2021-12-08	Code
26	Chinchilla 70B (0-shot)	81.8	No	Training Compute-Optimal Large Language Models	2022-03-29	Code
27	FLAN 137B (few-shot, k=10)	81.7	No	Finetuned Language Models Are Zero-Shot Learners	2021-09-03	Code
28	OPT-175B	81.07	No	SparseGPT: Massive Language Models Can Be Accura...	2023-01-02	Code
29	GPT-3 175B (0-shot)	81	No	Language Models are Few-Shot Learners	2020-05-28	Code
30	SparseGPT 175B (50% Sparsity)	80.63	No	SparseGPT: Massive Language Models Can Be Accura...	2023-01-02	Code
31	FLAN 137B (0-shot)	80.5	No	Finetuned Language Models Are Zero-Shot Learners	2021-09-03	Code
32	LLaMA 2 13B (0-shot)	80.5	No	Llama 2: Open Foundation and Fine-Tuned Chat Mod...	2023-07-18	Code
33	LLaMA 13B (0-shot)	80.1	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
34	LLaMA 7B (0-shot)	79.8	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
35	SparseGPT 175B (4:8 Sparsity)	79.54	No	SparseGPT: Massive Language Models Can Be Accura...	2023-01-02	Code
36	SparseGPT 175B (2:4 Sparsity)	79.54	No	SparseGPT: Massive Language Models Can Be Accura...	2023-01-02	Code
37	RoBERTa-Large 355M	79.4	No	RoBERTa: A Robustly Optimized BERT Pretraining A...	2019-07-26	Code
38	LLaMA 2 7B (0-shot)	78.8	No	Llama 2: Open Foundation and Fine-Tuned Chat Mod...	2023-07-18	Code
39	Bloomberg GPT 50B (1-shot)	77.9	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
40	OPT 66B (1-shot)	77.6	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
41	RoBERTa-large 355M (fine-tuned)	77.1	No	PIQA: Reasoning about Physical Commonsense in Na...	2019-11-26	Code
42	phi-1.5-web (1.3B)	77	No	Textbooks Are All You Need II: phi-1.5 technical...	2023-09-11	Code
43	BLOOM 176B (1-shot)	77	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
44	Pythia 12B (5-shot)	76.7	No	Pythia: A Suite for Analyzing Large Language Mod...	2023-04-03	Code
45	Open-LLaMA-3B-v2	76.2	No	Sheared LLaMA: Accelerating Language Model Pre-t...	2023-10-10	Code
46	Pythia 12B (0-shot)	76	No	Pythia: A Suite for Analyzing Large Language Mod...	2023-04-03	Code
47	Sheared-LLaMA-2.7B	75.8	No	Sheared LLaMA: Accelerating Language Model Pre-t...	2023-10-10	Code
48	GPT-NeoX 20B (1-shot)	75.8	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
49	Pythia 6.9B (0-shot)	75.2	No	Pythia: A Suite for Analyzing Large Language Mod...	2023-04-03	Code
50	Sheared-LLaMA-1.3B	73.4	No	Sheared LLaMA: Accelerating Language Model Pre-t...	2023-10-10	Code
51	sMLP - deterministic 9.4B (0-shot)	73	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
52	GPT-3 Large 760M (0-shot)	72.9	No	Language Models are Few-Shot Learners	2020-05-28	Code
53	FLAN-T5-Large 783M	72.2	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
54	LaMini-GPT 1.5B	71.3	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
55	LaMini-F-T5 783M	70.6	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
56	GPT-2-XL 1.5B	70.5	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
57	Pythia 1B (5-shot)	70.4	No	Pythia: A Suite for Analyzing Large Language Mod...	2023-04-03	Code
58	GPT-2-small 124M (fine-tuned)	69.2	No	PIQA: Reasoning about Physical Commonsense in Na...	2019-11-26	Code
59	Gshard 9B	68.1	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
60	LaMini-T5 738M	67.2	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
61	BERT-large 340M (fine-tuned)	66.8	No	PIQA: Reasoning about Physical Commonsense in Na...	2019-11-26	Code
62	BERT-Large 340M	66.7	No	BERT: Pre-training of Deep Bidirectional Transfo...	2018-10-11	Code
63	Base Layers 10B (0-shot)	63.8	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
64	HASH Layers 10B (0-shot)	63.8	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
65	T5-Large 738M	55.9	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
66	OPT-175B (50% Sparsity)	54.73	No	SparseGPT: Massive Language Models Can Be Accura...	2023-01-02	Code
67	Random chance baseline	50	No	Back to Square One: Artifact Detection, Training...	2021-04-16	Code