| 1 | CompassMTL 567M with Tailor | 96.1 | No | Task Compass: Scaling Multi-task Pre-training wi... | 2022-10-12 | Code |
| 2 | CompassMTL 567M | 95.6 | No | Task Compass: Scaling Multi-task Pre-training wi... | 2022-10-12 | Code |
| 3 | DeBERTa-Large 304M (classification-based) | 95.6 | No | Two is Better than Many? Binary Classification a... | 2022-10-29 | Code |
| 4 | GPT-4 (10-shot) | 95.3 | No | GPT-4 Technical Report | 2023-03-15 | Code |
| 5 | LLaMA3+MoSLoRA | 95 | No | Mixture-of-Subspaces in Low-Rank Adaptation | 2024-06-16 | Code |
| 6 | DeBERTa-Large 304M | 94.7 | No | Two is Better than Many? Binary Classification a... | 2022-10-29 | Code |
| 7 | LLaMA-2 13B + MixLoRA | 94.7 | No | MixLoRA: Enhancing Large Language Models Fine-Tu... | 2024-04-22 | Code |
| 8 | Unicorn 11B (fine-tuned) | 93.9 | Yes | UNICORN on RAINBOW: A Universal Commonsense Reas... | 2021-03-24 | Code |
| 9 | LLaMA-3 8B + MixLoRA | 93.3 | No | MixLoRA: Enhancing Large Language Models Fine-Tu... | 2024-04-22 | Code |
| 10 | LLaMA-2 7B + MixLoRA | 93.1 | No | MixLoRA: Enhancing Large Language Models Fine-Tu... | 2024-04-22 | Code |
| 11 | DeBERTa++ | 93 | No | DeBERTa: Decoding-enhanced BERT with Disentangle... | 2020-06-05 | Code |
| 12 | ELECTRA-Large 335M (fine-tuned on DiscoSense and HellaSwag) | 91.5 | No | DiscoSense: Commonsense Reasoning with Discourse... | 2022-10-22 | Code |
| 13 | DBRX Instruct 132B (10-shot) | 89 | No | - | - | - |
| 14 | TheBloke/llama-2-70b-Guanaco-QLoRA-fp16 (10-shot) | 88.3 | No | - | - | - |
| 15 | ALBERT-XXL 235M | 88 | No | - | - | - |
| 16 | PaLM 2-L (1-shot) | 87.4 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 17 | ELECTRA-Large 335M (fine-tuned on HellaSwag) | 86.9 | No | DiscoSense: Commonsense Reasoning with Discourse... | 2022-10-22 | Code |
| 18 | PaLM 2-M (1-shot) | 86.7 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 19 | MUPPET Roberta Large | 86.4 | No | Muppet: Massive Multi-task Representations with ... | 2021-01-26 | Code |
| 20 | LLaMA 65B + CFG (0-shot) | 86.3 | No | Stay on topic with Classifier-Free Guidance | 2023-06-30 | - |
| 21 | Falcon-180B (0-shot) | 85.9 | No | The Falcon Series of Open Language Models | 2023-11-28 | - |
| 22 | PaLM 2-S (1-shot) | 85.6 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 23 | GPT-3.5 (10-shot) | 85.5 | No | GPT-4 Technical Report | 2023-03-15 | Code |
| 24 | RoBERTa-Large Ensemble | 85.5 | No | RoBERTa: A Robustly Optimized BERT Pretraining A... | 2019-07-26 | Code |
| 25 | LLaMA 30B + CFG (0-shot) | 85.3 | No | Stay on topic with Classifier-Free Guidance | 2023-06-30 | - |
| 26 | LLaMA 2 70B (0-shot) | 85.3 | No | Llama 2: Open Foundation and Fine-Tuned Chat Mod... | 2023-07-18 | Code |
| 27 | HyKAS+CSKG | 85 | No | Towards Generalizable Neuro-Symbolic Systems for... | 2019-10-30 | - |
| 28 | LLaMA 65B (0-shot) | 84.2 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 29 | PaLM-540B (Few-Shot) | 83.8 | No | PaLM: Scaling Language Modeling with Pathways | 2022-04-05 | Code |
| 30 | PaLM-540B (1-shot) | 83.6 | No | PaLM: Scaling Language Modeling with Pathways | 2022-04-05 | Code |
| 31 | ExDeBERTa 567M | 83.6 | No | Task Compass: Scaling Multi-task Pre-training wi... | 2022-10-12 | Code |
| 32 | PaLM-540B (0-shot) | 83.4 | No | PaLM: Scaling Language Modeling with Pathways | 2022-04-05 | Code |
| 33 | LLaMA 2 34B (0-shot) | 83.3 | No | Llama 2: Open Foundation and Fine-Tuned Chat Mod... | 2023-07-18 | Code |
| 34 | Camelidae-8×34B (10-shot) | 83.2 | No | Parameter-Efficient Sparsity Crafting from Dense... | 2024-01-05 | Code |
| 35 | LLaMA 33B (0-shot) | 82.8 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 36 | Falcon-40B (0-shot) | 82.7 | No | The Falcon Series of Open Language Models | 2023-11-28 | - |
| 37 | Megatron-Turing NLG 530B (Few-Shot) | 82.4 | No | Using DeepSpeed and Megatron to Train Megatron-T... | 2022-01-28 | Code |
| 38 | Qwen2idae-16x14B (10-shot) | 82.3 | No | Parameter-Efficient Sparsity Crafting from Dense... | 2024-01-05 | Code |
| 39 | LLaMA 13B + CFG (0-shot) | 82.1 | No | Stay on topic with Classifier-Free Guidance | 2023-06-30 | - |
| 40 | RoBERTa-Large 355M | 81.7 | No | RoBERTa: A Robustly Optimized BERT Pretraining A... | 2019-07-26 | Code |
| 41 | Mistral 7B (0-shot) | 81.3 | No | Mistral 7B | 2023-10-10 | Code |
| 42 | Chinchilla 70B (0-shot) | 80.8 | No | Training Compute-Optimal Large Language Models | 2022-03-29 | Code |
| 43 | LLaMA 2 13B (0-shot) | 80.7 | No | Llama 2: Open Foundation and Fine-Tuned Chat Mod... | 2023-07-18 | Code |
| 44 | Megatron-Turing NLG 530B (1-shot) | 80.2 | No | Using DeepSpeed and Megatron to Train Megatron-T... | 2022-01-28 | Code |
| 45 | GPT-3 175B (few-shot, k=32) | 79.3 | No | Language Models are Few-Shot Learners | 2020-05-28 | Code |
| 46 | Gopher 280B (0-shot) | 79.2 | No | Scaling Language Models: Methods, Analysis & Ins... | 2021-12-08 | Code |
| 47 | LLaMA 13B (0-shot) | 79.2 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 48 | GPT-3 (0-shot) | 78.9 | No | Language Models are Few-Shot Learners | 2020-05-28 | Code |
| 49 | LLaMA 2 7B (0-shot) | 77.2 | No | Llama 2: Open Foundation and Fine-Tuned Chat Mod... | 2023-07-18 | Code |
| 50 | Falcon-7B (0-shot) | 76.3 | No | The Falcon Series of Open Language Models | 2023-11-28 | - |
| 51 | LLaMA 7B (0-shot) | 76.1 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 52 | BlooombergGPT 50B (1-shot) | 73.9 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 53 | OPT 66B (1-shot) | 73.5 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 54 | BLOOM 176B (1-shot) | 73.2 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 55 | Sheared-LLaMA-2.7B (50B) | 70.8 | No | Sheared LLaMA: Accelerating Language Model Pre-t... | 2023-10-10 | Code |
| 56 | GPT-NeoX 20B (1-shot) | 68.4 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 57 | Open-LLaMA-3B-v2 | 67.6 | No | Sheared LLaMA: Accelerating Language Model Pre-t... | 2023-10-10 | Code |
| 58 | Mamba-2.8B | 66.1 | No | Mamba: Linear-Time Sequence Modeling with Select... | 2023-12-01 | Code |
| 59 | Sheared-LLaMA-1.3B (50B) | 60.7 | No | Sheared LLaMA: Accelerating Language Model Pre-t... | 2023-10-10 | Code |
| 60 | FLAN 137B (3-shot) | 59.2 | No | Finetuned Language Models Are Zero-Shot Learners | 2021-09-03 | Code |
| 61 | Mamba-1.4B | 59.1 | No | Mamba: Linear-Time Sequence Modeling with Select... | 2023-12-01 | Code |
| 62 | FLAN 137B (0-shot) | 56.7 | No | Finetuned Language Models Are Zero-Shot Learners | 2021-09-03 | Code |
| 63 | sMLP – deterministic 9.4B (0-shot) | 54.5 | No | Efficient Language Modeling with Sparse all-MLP | 2022-03-14 | - |
| 64 | Switch Transformer 9B | 52.5 | No | Efficient Language Modeling with Sparse all-MLP | 2022-03-14 | - |
| 65 | GPT-3 Large 760M (0-shot) | 51 | No | Language Models are Few-Shot Learners | 2020-05-28 | Code |
| 66 | GPT-2-XL 1.5B | 50.9 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 67 | OPT-6.7B | 50.3 | No | LLM in a flash: Efficient Large Language Model I... | 2023-12-12 | - |
| 68 | LLM in a Flash (OPT-6.7B with Predictor) | 49.8 | No | LLM in a flash: Efficient Large Language Model I... | 2023-12-12 | - |
| 69 | FLAN-T5-Large 783M | 48.7 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 70 | LaMini-GPT 1.5B | 48.3 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 71 | BERT-Large 340M | 47.3 | No | HellaSwag: Can a Machine Really Finish Your Sent... | 2019-05-19 | Code |
| 72 | LaMini-F-T5 783M | 43.7 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 73 | GPT-1 117M | 41.7 | No | HellaSwag: Can a Machine Really Finish Your Sent... | 2019-05-19 | Code |
| 74 | Flipped-3B | 41.6 | No | Guess the Instruction! Flipped Learning Makes La... | 2022-10-06 | Code |
| 75 | T0-3B (CoT fine-tuned) | 41.1 | No | The CoT Collection: Improving Zero-shot and Few-... | 2023-05-23 | Code |
| 76 | LaMini-T5 738M | 40.6 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 77 | BERT-Base 110M | 40.5 | No | HellaSwag: Can a Machine Really Finish Your Sent... | 2019-05-19 | Code |
| 78 | T5-Large 738M | 38.9 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 79 | Gshard 9B | 38 | No | Efficient Language Modeling with Sparse all-MLP | 2022-03-14 | - |
| 80 | LSTM + BERT-Base | 36.2 | No | HellaSwag: Can a Machine Really Finish Your Sent... | 2019-05-19 | Code |
| 81 | RoE-3B | 34.6 | No | Exploring the Benefits of Training Expert Langua... | 2023-02-07 | Code |
| 82 | ESIM + ElMo | 33.3 | No | HellaSwag: Can a Machine Really Finish Your Sent... | 2019-05-19 | Code |
| 83 | HASH Layers 10B (0-shot) | 33 | No | Efficient Language Modeling with Sparse all-MLP | 2022-03-14 | - |
| 84 | LSTM + GloVe | 31.7 | No | HellaSwag: Can a Machine Really Finish Your Sent... | 2019-05-19 | Code |
| 85 | fastText | 31.6 | No | HellaSwag: Can a Machine Really Finish Your Sent... | 2019-05-19 | Code |
| 86 | LSTM + ElMo | 31.4 | No | HellaSwag: Can a Machine Really Finish Your Sent... | 2019-05-19 | Code |
| 87 | Base Layers 10B (0-shot) | 30.2 | No | Efficient Language Modeling with Sparse all-MLP | 2022-03-14 | - |
| 88 | KiC-770M | 29.6 | No | Knowledge-in-Context: Towards Knowledgeable Semi... | 2022-10-28 | - |
| 89 | Random chance baseline | 25 | No | HellaSwag: Can a Machine Really Finish Your Sent... | 2019-05-19 | Code |