| 1 | Unicorn 11B (fine-tuned) | 90.1 | No | UNICORN on RAINBOW: A Universal Commonsense Reas... | 2021-03-24 | Code |
| 2 | LLaMA3 8B+MoSLoRA | 89.7 | No | Mixture-of-Subspaces in Low-Rank Adaptation | 2024-06-16 | Code |
| 3 | CompassMTL 567M with Tailor | 88.3 | No | Task Compass: Scaling Multi-task Pre-training wi... | 2022-10-12 | Code |
| 4 | LLaMA-3 8B + MixLoRA | 87.6 | No | MixLoRA: Enhancing Large Language Models Fine-Tu... | 2024-04-22 | Code |
| 5 | DeBERTa-Large 304M | 87.4 | No | Two is Better than Many? Binary Classification a... | 2022-10-29 | Code |
| 6 | CompassMTL 567M | 87.3 | No | Task Compass: Scaling Multi-task Pre-training wi... | 2022-10-12 | Code |
| 7 | LLaMA-2 13B + MixLoRA | 86.8 | No | MixLoRA: Enhancing Large Language Models Fine-Tu... | 2024-04-22 | Code |
| 8 | Shakti-LLM (2.5B) | 86.2 | No | SHAKTI: A 2.5 Billion Parameter Small Language M... | 2024-10-15 | - |
| 9 | DeBERTa-Large 304M (classification-based) | 85.9 | No | Two is Better than Many? Binary Classification a... | 2022-10-29 | Code |
| 10 | ExDeBERTa 567M | 85.5 | No | Task Compass: Scaling Multi-task Pre-training wi... | 2022-10-12 | Code |
| 11 | UnifiedQA 3B | 85.3 | No | UnifiedQA: Crossing Format Boundaries With a Sin... | 2020-05-02 | Code |
| 12 | PaLM 2-L (1-shot) | 85 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 13 | Mixtral 8x7B (0-shot) | 83.6 | No | Mixtral of Experts | 2024-01-08 | Code |
| 14 | PaLM 2-M (1-shot) | 83.2 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 15 | LLaMA-2 7B + MixLoRA | 83.2 | No | MixLoRA: Enhancing Large Language Models Fine-Tu... | 2024-04-22 | Code |
| 16 | Mistral 7B (0-shot) | 83 | No | Mistral 7B | 2023-10-10 | Code |
| 17 | LLaMA 65B (0-shot) | 82.8 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 18 | LLaMA 2 70B (0-shot) | 82.8 | No | Llama 2: Open Foundation and Fine-Tuned Chat Mod... | 2023-07-18 | Code |
| 19 | Camelidae-8×34B | 82.7 | No | Parameter-Efficient Sparsity Crafting from Dense... | 2024-01-05 | Code |
| 20 | LLaMA 33B (0-shot) | 82.3 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 21 | PaLM 2-S (1-shot) | 82.2 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 22 | Mistral 7B (0-shot) | 82.2 | No | Mixtral of Experts | 2024-01-08 | Code |
| 23 | MT-NLG 530B (0-shot) | 82 | No | Megatron-LM: Training Multi-Billion Parameter La... | 2019-09-17 | Code |
| 24 | LLaMA 2 34B (0-shot) | 81.9 | No | Llama 2: Open Foundation and Fine-Tuned Chat Mod... | 2023-07-18 | Code |
| 25 | Gopher 280B (0-shot) | 81.8 | No | Scaling Language Models: Methods, Analysis & Ins... | 2021-12-08 | Code |
| 26 | Chinchilla 70B (0-shot) | 81.8 | No | Training Compute-Optimal Large Language Models | 2022-03-29 | Code |
| 27 | FLAN 137B (few-shot, k=10) | 81.7 | No | Finetuned Language Models Are Zero-Shot Learners | 2021-09-03 | Code |
| 28 | OPT-175B | 81.07 | No | SparseGPT: Massive Language Models Can Be Accura... | 2023-01-02 | Code |
| 29 | GPT-3 175B (0-shot) | 81 | No | Language Models are Few-Shot Learners | 2020-05-28 | Code |
| 30 | SparseGPT 175B (50% Sparsity) | 80.63 | No | SparseGPT: Massive Language Models Can Be Accura... | 2023-01-02 | Code |
| 31 | FLAN 137B (0-shot) | 80.5 | No | Finetuned Language Models Are Zero-Shot Learners | 2021-09-03 | Code |
| 32 | LLaMA 2 13B (0-shot) | 80.5 | No | Llama 2: Open Foundation and Fine-Tuned Chat Mod... | 2023-07-18 | Code |
| 33 | LLaMA 13B (0-shot) | 80.1 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 34 | LLaMA 7B (0-shot) | 79.8 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 35 | SparseGPT 175B (4:8 Sparsity) | 79.54 | No | SparseGPT: Massive Language Models Can Be Accura... | 2023-01-02 | Code |
| 36 | SparseGPT 175B (2:4 Sparsity) | 79.54 | No | SparseGPT: Massive Language Models Can Be Accura... | 2023-01-02 | Code |
| 37 | RoBERTa-Large 355M | 79.4 | No | RoBERTa: A Robustly Optimized BERT Pretraining A... | 2019-07-26 | Code |
| 38 | LLaMA 2 7B (0-shot) | 78.8 | No | Llama 2: Open Foundation and Fine-Tuned Chat Mod... | 2023-07-18 | Code |
| 39 | Bloomberg GPT 50B (1-shot) | 77.9 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 40 | OPT 66B (1-shot) | 77.6 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 41 | RoBERTa-large 355M (fine-tuned) | 77.1 | No | PIQA: Reasoning about Physical Commonsense in Na... | 2019-11-26 | Code |
| 42 | phi-1.5-web (1.3B) | 77 | No | Textbooks Are All You Need II: phi-1.5 technical... | 2023-09-11 | Code |
| 43 | BLOOM 176B (1-shot) | 77 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 44 | Pythia 12B (5-shot) | 76.7 | No | Pythia: A Suite for Analyzing Large Language Mod... | 2023-04-03 | Code |
| 45 | Open-LLaMA-3B-v2 | 76.2 | No | Sheared LLaMA: Accelerating Language Model Pre-t... | 2023-10-10 | Code |
| 46 | Pythia 12B (0-shot) | 76 | No | Pythia: A Suite for Analyzing Large Language Mod... | 2023-04-03 | Code |
| 47 | Sheared-LLaMA-2.7B | 75.8 | No | Sheared LLaMA: Accelerating Language Model Pre-t... | 2023-10-10 | Code |
| 48 | GPT-NeoX 20B (1-shot) | 75.8 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 49 | Pythia 6.9B (0-shot) | 75.2 | No | Pythia: A Suite for Analyzing Large Language Mod... | 2023-04-03 | Code |
| 50 | Sheared-LLaMA-1.3B | 73.4 | No | Sheared LLaMA: Accelerating Language Model Pre-t... | 2023-10-10 | Code |
| 51 | sMLP - deterministic 9.4B (0-shot) | 73 | No | Efficient Language Modeling with Sparse all-MLP | 2022-03-14 | - |
| 52 | GPT-3 Large 760M (0-shot) | 72.9 | No | Language Models are Few-Shot Learners | 2020-05-28 | Code |
| 53 | FLAN-T5-Large 783M | 72.2 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 54 | LaMini-GPT 1.5B | 71.3 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 55 | LaMini-F-T5 783M | 70.6 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 56 | GPT-2-XL 1.5B | 70.5 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 57 | Pythia 1B (5-shot) | 70.4 | No | Pythia: A Suite for Analyzing Large Language Mod... | 2023-04-03 | Code |
| 58 | GPT-2-small 124M (fine-tuned) | 69.2 | No | PIQA: Reasoning about Physical Commonsense in Na... | 2019-11-26 | Code |
| 59 | Gshard 9B | 68.1 | No | Efficient Language Modeling with Sparse all-MLP | 2022-03-14 | - |
| 60 | LaMini-T5 738M | 67.2 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 61 | BERT-large 340M (fine-tuned) | 66.8 | No | PIQA: Reasoning about Physical Commonsense in Na... | 2019-11-26 | Code |
| 62 | BERT-Large 340M | 66.7 | No | BERT: Pre-training of Deep Bidirectional Transfo... | 2018-10-11 | Code |
| 63 | Base Layers 10B (0-shot) | 63.8 | No | Efficient Language Modeling with Sparse all-MLP | 2022-03-14 | - |
| 64 | HASH Layers 10B (0-shot) | 63.8 | No | Efficient Language Modeling with Sparse all-MLP | 2022-03-14 | - |
| 65 | T5-Large 738M | 55.9 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 66 | OPT-175B (50% Sparsity) | 54.73 | No | SparseGPT: Massive Language Models Can Be Accura... | 2023-01-02 | Code |
| 67 | Random chance baseline | 50 | No | Back to Square One: Artifact Detection, Training... | 2021-04-16 | Code |