TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Sentence Completion/HellaSwag

Sentence Completion on HellaSwag

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1CompassMTL 567M with Tailor96.1NoTask Compass: Scaling Multi-task Pre-training wi...2022-10-12Code
2CompassMTL 567M95.6NoTask Compass: Scaling Multi-task Pre-training wi...2022-10-12Code
3DeBERTa-Large 304M (classification-based)95.6NoTwo is Better than Many? Binary Classification a...2022-10-29Code
4GPT-4 (10-shot)95.3NoGPT-4 Technical Report2023-03-15Code
5LLaMA3+MoSLoRA95NoMixture-of-Subspaces in Low-Rank Adaptation2024-06-16Code
6DeBERTa-Large 304M94.7NoTwo is Better than Many? Binary Classification a...2022-10-29Code
7LLaMA-2 13B + MixLoRA94.7NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
8Unicorn 11B (fine-tuned)93.9YesUNICORN on RAINBOW: A Universal Commonsense Reas...2021-03-24Code
9LLaMA-3 8B + MixLoRA93.3NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
10LLaMA-2 7B + MixLoRA93.1NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
11DeBERTa++93NoDeBERTa: Decoding-enhanced BERT with Disentangle...2020-06-05Code
12ELECTRA-Large 335M (fine-tuned on DiscoSense and HellaSwag)91.5NoDiscoSense: Commonsense Reasoning with Discourse...2022-10-22Code
13DBRX Instruct 132B (10-shot)89No---
14TheBloke/llama-2-70b-Guanaco-QLoRA-fp16 (10-shot)88.3No---
15ALBERT-XXL 235M88No---
16PaLM 2-L (1-shot)87.4NoPaLM 2 Technical Report2023-05-17Code
17ELECTRA-Large 335M (fine-tuned on HellaSwag)86.9NoDiscoSense: Commonsense Reasoning with Discourse...2022-10-22Code
18PaLM 2-M (1-shot)86.7NoPaLM 2 Technical Report2023-05-17Code
19MUPPET Roberta Large86.4NoMuppet: Massive Multi-task Representations with ...2021-01-26Code
20LLaMA 65B + CFG (0-shot)86.3NoStay on topic with Classifier-Free Guidance2023-06-30-
21Falcon-180B (0-shot)85.9NoThe Falcon Series of Open Language Models2023-11-28-
22PaLM 2-S (1-shot)85.6NoPaLM 2 Technical Report2023-05-17Code
23GPT-3.5 (10-shot)85.5NoGPT-4 Technical Report2023-03-15Code
24RoBERTa-Large Ensemble85.5NoRoBERTa: A Robustly Optimized BERT Pretraining A...2019-07-26Code
25LLaMA 30B + CFG (0-shot)85.3NoStay on topic with Classifier-Free Guidance2023-06-30-
26LLaMA 2 70B (0-shot)85.3NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
27HyKAS+CSKG85NoTowards Generalizable Neuro-Symbolic Systems for...2019-10-30-
28LLaMA 65B (0-shot)84.2NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
29PaLM-540B (Few-Shot)83.8NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
30PaLM-540B (1-shot)83.6NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
31ExDeBERTa 567M83.6NoTask Compass: Scaling Multi-task Pre-training wi...2022-10-12Code
32PaLM-540B (0-shot)83.4NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
33LLaMA 2 34B (0-shot)83.3NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
34Camelidae-8×34B (10-shot)83.2NoParameter-Efficient Sparsity Crafting from Dense...2024-01-05Code
35LLaMA 33B (0-shot)82.8NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
36Falcon-40B (0-shot)82.7NoThe Falcon Series of Open Language Models2023-11-28-
37Megatron-Turing NLG 530B (Few-Shot)82.4NoUsing DeepSpeed and Megatron to Train Megatron-T...2022-01-28Code
38Qwen2idae-16x14B (10-shot)82.3NoParameter-Efficient Sparsity Crafting from Dense...2024-01-05Code
39LLaMA 13B + CFG (0-shot)82.1NoStay on topic with Classifier-Free Guidance2023-06-30-
40RoBERTa-Large 355M81.7NoRoBERTa: A Robustly Optimized BERT Pretraining A...2019-07-26Code
41Mistral 7B (0-shot)81.3NoMistral 7B2023-10-10Code
42Chinchilla 70B (0-shot)80.8NoTraining Compute-Optimal Large Language Models2022-03-29Code
43LLaMA 2 13B (0-shot)80.7NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
44Megatron-Turing NLG 530B (1-shot)80.2NoUsing DeepSpeed and Megatron to Train Megatron-T...2022-01-28Code
45GPT-3 175B (few-shot, k=32)79.3NoLanguage Models are Few-Shot Learners2020-05-28Code
46Gopher 280B (0-shot)79.2NoScaling Language Models: Methods, Analysis & Ins...2021-12-08Code
47LLaMA 13B (0-shot)79.2NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
48GPT-3 (0-shot)78.9NoLanguage Models are Few-Shot Learners2020-05-28Code
49LLaMA 2 7B (0-shot)77.2NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
50Falcon-7B (0-shot)76.3NoThe Falcon Series of Open Language Models2023-11-28-
51LLaMA 7B (0-shot)76.1NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
52BlooombergGPT 50B (1-shot)73.9NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
53OPT 66B (1-shot)73.5NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
54BLOOM 176B (1-shot)73.2NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
55Sheared-LLaMA-2.7B (50B)70.8NoSheared LLaMA: Accelerating Language Model Pre-t...2023-10-10Code
56GPT-NeoX 20B (1-shot)68.4NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
57Open-LLaMA-3B-v267.6NoSheared LLaMA: Accelerating Language Model Pre-t...2023-10-10Code
58Mamba-2.8B66.1NoMamba: Linear-Time Sequence Modeling with Select...2023-12-01Code
59Sheared-LLaMA-1.3B (50B)60.7NoSheared LLaMA: Accelerating Language Model Pre-t...2023-10-10Code
60FLAN 137B (3-shot)59.2NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
61Mamba-1.4B59.1NoMamba: Linear-Time Sequence Modeling with Select...2023-12-01Code
62FLAN 137B (0-shot)56.7NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
63sMLP – deterministic 9.4B (0-shot)54.5NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
64Switch Transformer 9B52.5NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
65GPT-3 Large 760M (0-shot)51NoLanguage Models are Few-Shot Learners2020-05-28Code
66GPT-2-XL 1.5B50.9NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
67OPT-6.7B50.3NoLLM in a flash: Efficient Large Language Model I...2023-12-12-
68LLM in a Flash (OPT-6.7B with Predictor)49.8NoLLM in a flash: Efficient Large Language Model I...2023-12-12-
69FLAN-T5-Large 783M48.7NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
70LaMini-GPT 1.5B48.3NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
71BERT-Large 340M47.3NoHellaSwag: Can a Machine Really Finish Your Sent...2019-05-19Code
72LaMini-F-T5 783M43.7NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
73GPT-1 117M41.7NoHellaSwag: Can a Machine Really Finish Your Sent...2019-05-19Code
74Flipped-3B41.6NoGuess the Instruction! Flipped Learning Makes La...2022-10-06Code
75T0-3B (CoT fine-tuned)41.1NoThe CoT Collection: Improving Zero-shot and Few-...2023-05-23Code
76LaMini-T5 738M40.6NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
77BERT-Base 110M40.5NoHellaSwag: Can a Machine Really Finish Your Sent...2019-05-19Code
78T5-Large 738M38.9NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
79Gshard 9B38NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
80LSTM + BERT-Base36.2NoHellaSwag: Can a Machine Really Finish Your Sent...2019-05-19Code
81RoE-3B34.6NoExploring the Benefits of Training Expert Langua...2023-02-07Code
82ESIM + ElMo33.3NoHellaSwag: Can a Machine Really Finish Your Sent...2019-05-19Code
83HASH Layers 10B (0-shot)33NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
84LSTM + GloVe31.7NoHellaSwag: Can a Machine Really Finish Your Sent...2019-05-19Code
85fastText31.6NoHellaSwag: Can a Machine Really Finish Your Sent...2019-05-19Code
86LSTM + ElMo31.4NoHellaSwag: Can a Machine Really Finish Your Sent...2019-05-19Code
87Base Layers 10B (0-shot)30.2NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
88KiC-770M29.6NoKnowledge-in-Context: Towards Knowledgeable Semi...2022-10-28-
89Random chance baseline25NoHellaSwag: Can a Machine Really Finish Your Sent...2019-05-19Code