Sentence Completion on HellaSwag

Metric: Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

#	Model↕	Accuracy▼	Extra Data	Paper	Date↕	Code
1	CompassMTL 567M with Tailor	96.1	No	Task Compass: Scaling Multi-task Pre-training wi...	2022-10-12	Code
2	CompassMTL 567M	95.6	No	Task Compass: Scaling Multi-task Pre-training wi...	2022-10-12	Code
3	DeBERTa-Large 304M (classification-based)	95.6	No	Two is Better than Many? Binary Classification a...	2022-10-29	Code
4	GPT-4 (10-shot)	95.3	No	GPT-4 Technical Report	2023-03-15	Code
5	LLaMA3+MoSLoRA	95	No	Mixture-of-Subspaces in Low-Rank Adaptation	2024-06-16	Code
6	DeBERTa-Large 304M	94.7	No	Two is Better than Many? Binary Classification a...	2022-10-29	Code
7	LLaMA-2 13B + MixLoRA	94.7	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
8	Unicorn 11B (fine-tuned)	93.9	Yes	UNICORN on RAINBOW: A Universal Commonsense Reas...	2021-03-24	Code
9	LLaMA-3 8B + MixLoRA	93.3	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
10	LLaMA-2 7B + MixLoRA	93.1	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
11	DeBERTa++	93	No	DeBERTa: Decoding-enhanced BERT with Disentangle...	2020-06-05	Code
12	ELECTRA-Large 335M (fine-tuned on DiscoSense and HellaSwag)	91.5	No	DiscoSense: Commonsense Reasoning with Discourse...	2022-10-22	Code
13	DBRX Instruct 132B (10-shot)	89	No	-	-	-
14	TheBloke/llama-2-70b-Guanaco-QLoRA-fp16 (10-shot)	88.3	No	-	-	-
15	ALBERT-XXL 235M	88	No	-	-	-
16	PaLM 2-L (1-shot)	87.4	No	PaLM 2 Technical Report	2023-05-17	Code
17	ELECTRA-Large 335M (fine-tuned on HellaSwag)	86.9	No	DiscoSense: Commonsense Reasoning with Discourse...	2022-10-22	Code
18	PaLM 2-M (1-shot)	86.7	No	PaLM 2 Technical Report	2023-05-17	Code
19	MUPPET Roberta Large	86.4	No	Muppet: Massive Multi-task Representations with ...	2021-01-26	Code
20	LLaMA 65B + CFG (0-shot)	86.3	No	Stay on topic with Classifier-Free Guidance	2023-06-30	-
21	Falcon-180B (0-shot)	85.9	No	The Falcon Series of Open Language Models	2023-11-28	-
22	PaLM 2-S (1-shot)	85.6	No	PaLM 2 Technical Report	2023-05-17	Code
23	GPT-3.5 (10-shot)	85.5	No	GPT-4 Technical Report	2023-03-15	Code
24	RoBERTa-Large Ensemble	85.5	No	RoBERTa: A Robustly Optimized BERT Pretraining A...	2019-07-26	Code
25	LLaMA 30B + CFG (0-shot)	85.3	No	Stay on topic with Classifier-Free Guidance	2023-06-30	-
26	LLaMA 2 70B (0-shot)	85.3	No	Llama 2: Open Foundation and Fine-Tuned Chat Mod...	2023-07-18	Code
27	HyKAS+CSKG	85	No	Towards Generalizable Neuro-Symbolic Systems for...	2019-10-30	-
28	LLaMA 65B (0-shot)	84.2	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
29	PaLM-540B (Few-Shot)	83.8	No	PaLM: Scaling Language Modeling with Pathways	2022-04-05	Code
30	PaLM-540B (1-shot)	83.6	No	PaLM: Scaling Language Modeling with Pathways	2022-04-05	Code
31	ExDeBERTa 567M	83.6	No	Task Compass: Scaling Multi-task Pre-training wi...	2022-10-12	Code
32	PaLM-540B (0-shot)	83.4	No	PaLM: Scaling Language Modeling with Pathways	2022-04-05	Code
33	LLaMA 2 34B (0-shot)	83.3	No	Llama 2: Open Foundation and Fine-Tuned Chat Mod...	2023-07-18	Code
34	Camelidae-8×34B (10-shot)	83.2	No	Parameter-Efficient Sparsity Crafting from Dense...	2024-01-05	Code
35	LLaMA 33B (0-shot)	82.8	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
36	Falcon-40B (0-shot)	82.7	No	The Falcon Series of Open Language Models	2023-11-28	-
37	Megatron-Turing NLG 530B (Few-Shot)	82.4	No	Using DeepSpeed and Megatron to Train Megatron-T...	2022-01-28	Code
38	Qwen2idae-16x14B (10-shot)	82.3	No	Parameter-Efficient Sparsity Crafting from Dense...	2024-01-05	Code
39	LLaMA 13B + CFG (0-shot)	82.1	No	Stay on topic with Classifier-Free Guidance	2023-06-30	-
40	RoBERTa-Large 355M	81.7	No	RoBERTa: A Robustly Optimized BERT Pretraining A...	2019-07-26	Code
41	Mistral 7B (0-shot)	81.3	No	Mistral 7B	2023-10-10	Code
42	Chinchilla 70B (0-shot)	80.8	No	Training Compute-Optimal Large Language Models	2022-03-29	Code
43	LLaMA 2 13B (0-shot)	80.7	No	Llama 2: Open Foundation and Fine-Tuned Chat Mod...	2023-07-18	Code
44	Megatron-Turing NLG 530B (1-shot)	80.2	No	Using DeepSpeed and Megatron to Train Megatron-T...	2022-01-28	Code
45	GPT-3 175B (few-shot, k=32)	79.3	No	Language Models are Few-Shot Learners	2020-05-28	Code
46	Gopher 280B (0-shot)	79.2	No	Scaling Language Models: Methods, Analysis & Ins...	2021-12-08	Code
47	LLaMA 13B (0-shot)	79.2	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
48	GPT-3 (0-shot)	78.9	No	Language Models are Few-Shot Learners	2020-05-28	Code
49	LLaMA 2 7B (0-shot)	77.2	No	Llama 2: Open Foundation and Fine-Tuned Chat Mod...	2023-07-18	Code
50	Falcon-7B (0-shot)	76.3	No	The Falcon Series of Open Language Models	2023-11-28	-
51	LLaMA 7B (0-shot)	76.1	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
52	BlooombergGPT 50B (1-shot)	73.9	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
53	OPT 66B (1-shot)	73.5	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
54	BLOOM 176B (1-shot)	73.2	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
55	Sheared-LLaMA-2.7B (50B)	70.8	No	Sheared LLaMA: Accelerating Language Model Pre-t...	2023-10-10	Code
56	GPT-NeoX 20B (1-shot)	68.4	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
57	Open-LLaMA-3B-v2	67.6	No	Sheared LLaMA: Accelerating Language Model Pre-t...	2023-10-10	Code
58	Mamba-2.8B	66.1	No	Mamba: Linear-Time Sequence Modeling with Select...	2023-12-01	Code
59	Sheared-LLaMA-1.3B (50B)	60.7	No	Sheared LLaMA: Accelerating Language Model Pre-t...	2023-10-10	Code
60	FLAN 137B (3-shot)	59.2	No	Finetuned Language Models Are Zero-Shot Learners	2021-09-03	Code
61	Mamba-1.4B	59.1	No	Mamba: Linear-Time Sequence Modeling with Select...	2023-12-01	Code
62	FLAN 137B (0-shot)	56.7	No	Finetuned Language Models Are Zero-Shot Learners	2021-09-03	Code
63	sMLP – deterministic 9.4B (0-shot)	54.5	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
64	Switch Transformer 9B	52.5	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
65	GPT-3 Large 760M (0-shot)	51	No	Language Models are Few-Shot Learners	2020-05-28	Code
66	GPT-2-XL 1.5B	50.9	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
67	OPT-6.7B	50.3	No	LLM in a flash: Efficient Large Language Model I...	2023-12-12	-
68	LLM in a Flash (OPT-6.7B with Predictor)	49.8	No	LLM in a flash: Efficient Large Language Model I...	2023-12-12	-
69	FLAN-T5-Large 783M	48.7	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
70	LaMini-GPT 1.5B	48.3	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
71	BERT-Large 340M	47.3	No	HellaSwag: Can a Machine Really Finish Your Sent...	2019-05-19	Code
72	LaMini-F-T5 783M	43.7	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
73	GPT-1 117M	41.7	No	HellaSwag: Can a Machine Really Finish Your Sent...	2019-05-19	Code
74	Flipped-3B	41.6	No	Guess the Instruction! Flipped Learning Makes La...	2022-10-06	Code
75	T0-3B (CoT fine-tuned)	41.1	No	The CoT Collection: Improving Zero-shot and Few-...	2023-05-23	Code
76	LaMini-T5 738M	40.6	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
77	BERT-Base 110M	40.5	No	HellaSwag: Can a Machine Really Finish Your Sent...	2019-05-19	Code
78	T5-Large 738M	38.9	No	LaMini-LM: A Diverse Herd of Distilled Models fr...	2023-04-27	Code
79	Gshard 9B	38	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
80	LSTM + BERT-Base	36.2	No	HellaSwag: Can a Machine Really Finish Your Sent...	2019-05-19	Code
81	RoE-3B	34.6	No	Exploring the Benefits of Training Expert Langua...	2023-02-07	Code
82	ESIM + ElMo	33.3	No	HellaSwag: Can a Machine Really Finish Your Sent...	2019-05-19	Code
83	HASH Layers 10B (0-shot)	33	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
84	LSTM + GloVe	31.7	No	HellaSwag: Can a Machine Really Finish Your Sent...	2019-05-19	Code
85	fastText	31.6	No	HellaSwag: Can a Machine Really Finish Your Sent...	2019-05-19	Code
86	LSTM + ElMo	31.4	No	HellaSwag: Can a Machine Really Finish Your Sent...	2019-05-19	Code
87	Base Layers 10B (0-shot)	30.2	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
88	KiC-770M	29.6	No	Knowledge-in-Context: Towards Knowledgeable Semi...	2022-10-28	-
89	Random chance baseline	25	No	HellaSwag: Can a Machine Really Finish Your Sent...	2019-05-19	Code