| 1 | ST-MoE-32B 269B (fine-tuned) | 96.1 | No | ST-MoE: Designing Stable and Transferable Sparse... | 2022-02-17 | Code |
| 2 | Unicorn 11B (fine-tuned) | 91.3 | No | UNICORN on RAINBOW: A Universal Commonsense Reas... | 2021-03-24 | Code |
| 3 | CompassMTL 567M with Tailor | 90.5 | No | Task Compass: Scaling Multi-task Pre-training wi... | 2022-10-12 | Code |
| 4 | CompassMTL 567M | 89.6 | No | Task Compass: Scaling Multi-task Pre-training wi... | 2022-10-12 | Code |
| 5 | UnifiedQA 11B (fine-tuned) | 89.4 | No | UnifiedQA: Crossing Format Boundaries With a Sin... | 2020-05-02 | Code |
| 6 | Claude 3 Opus (5-shot) | 88.5 | No | - | - | - |
| 7 | GPT-4 (5-shot) | 87.5 | No | GPT-4 Technical Report | 2023-03-15 | Code |
| 8 | ExDeBERTa 567M | 87 | No | Task Compass: Scaling Multi-task Pre-training wi... | 2022-10-12 | Code |
| 9 | LLaMA-2 13B + MixLoRA | 86.3 | No | MixLoRA: Enhancing Large Language Models Fine-Tu... | 2024-04-22 | Code |
| 10 | LLaMA3 8B+MoSLoRA | 85.8 | No | Mixture-of-Subspaces in Low-Rank Adaptation | 2024-06-16 | Code |
| 11 | PaLM 2-L (1-shot) | 83 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 12 | LLaMA-3 8B + MixLoRA | 82.1 | No | MixLoRA: Enhancing Large Language Models Fine-Tu... | 2024-04-22 | Code |
| 13 | ST-MoE-L 4.1B (fine-tuned) | 81.7 | No | ST-MoE: Designing Stable and Transferable Sparse... | 2022-02-17 | Code |
| 14 | GPT-3.5 (5-shot) | 81.6 | No | GPT-4 Technical Report | 2023-03-15 | Code |
| 15 | PaLM 540B (0-shot) | 81.1 | No | PaLM: Scaling Language Modeling with Pathways | 2022-04-05 | Code |
| 16 | Camelidae-8×34B | 80.9 | No | Parameter-Efficient Sparsity Crafting from Dense... | 2024-01-05 | Code |
| 17 | PaLM 2-M (1-shot) | 79.2 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 18 | RoBERTa-Winogrande 355M (fine-tuned) | 79.1 | No | WinoGrande: An Adversarial Winograd Schema Chall... | 2019-07-24 | Code |
| 19 | PaLM 2-S (1-shot) | 77.9 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 20 | Mixtral 8x7B (0-shot) | 77.2 | No | Mixtral of Experts | 2024-01-08 | Code |
| 21 | PaLM 62B (0-shot) | 77 | No | PaLM: Scaling Language Modeling with Pathways | 2022-04-05 | Code |
| 22 | PaLM-cont 62B (0-shot) | 77 | No | PaLM: Scaling Language Modeling with Pathways | 2022-04-05 | Code |
| 23 | LLaMA 65B (0-shot) | 77 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 24 | LLaMA-2 7B + MixLoRA | 76.8 | No | MixLoRA: Enhancing Large Language Models Fine-Tu... | 2024-04-22 | Code |
| 25 | LLaMA 33B (0-shot) | 76 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 26 | Mistral 7B (0-shot) | 75.3 | No | Mistral 7B | 2023-10-10 | Code |
| 27 | Claude 3 Sonnet (5-shot) | 75.1 | No | - | - | - |
| 28 | Chinchilla 70B (0-shot) | 74.9 | No | Training Compute-Optimal Large Language Models | 2022-03-29 | Code |
| 29 | Claude 3 Haiku (5-shot) | 74.2 | No | - | - | - |
| 30 | Mistral 7B (0-shot) | 74.2 | No | Mixtral of Experts | 2024-01-08 | Code |
| 31 | phi-1.5-web 1.3B (zero-shot) | 74 | No | Textbooks Are All You Need II: phi-1.5 technical... | 2023-09-11 | Code |
| 32 | Unified QA 406M (fine-tuned) | 73.3 | No | UnifiedQA: Crossing Format Boundaries With a Sin... | 2020-05-02 | Code |
| 33 | LLaMA 13B (0-shot) | 73 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 34 | FLAN 137B (few-shot, k=16) | 72.8 | No | Finetuned Language Models Are Zero-Shot Learners | 2021-09-03 | Code |
| 35 | G-DAUG-Combo + RoBERTa-Large | 71.4 | No | Generative Data Augmentation for Commonsense Rea... | 2020-04-24 | Code |
| 36 | FLAN 137B (0-shot) | 71.2 | No | Finetuned Language Models Are Zero-Shot Learners | 2021-09-03 | Code |
| 37 | RWKV v5 Eagle 7B | 70.8 | No | - | - | - |
| 38 | Branch-Train-MiX 4x7B (sampling top-1 expert) | 70.6 | No | Branch-Train-MiX: Mixing Expert LLMs into a Mixt... | 2024-03-12 | Code |
| 39 | GPT-3 175B (0-shot) | 70.2 | No | Language Models are Few-Shot Learners | 2020-05-28 | Code |
| 40 | Gopher 280B (0-shot) | 70.1 | No | Scaling Language Models: Methods, Analysis & Ins... | 2021-12-08 | Code |
| 41 | LLaMA 7B (0-shot) | 70.1 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 42 | BLOOM 176B (1-shot) | 67 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 43 | Pythia 12B (5-shot) | 66.6 | No | Pythia: A Suite for Analyzing Large Language Mod... | 2023-04-03 | Code |
| 44 | OPT 66B (1-shot) | 66.1 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 45 | BERT-Winogrande 345M (fine-tuned) | 64.9 | No | WinoGrande: An Adversarial Winograd Schema Chall... | 2019-07-24 | Code |
| 46 | Bloomberg GPT (one-shot) | 64.1 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 47 | Pythia 12B (0-shot) | 63.9 | No | Pythia: A Suite for Analyzing Large Language Mod... | 2023-04-03 | Code |
| 48 | RoE-3B | 61.6 | No | Exploring the Benefits of Training Expert Langua... | 2023-02-07 | Code |
| 49 | Pythia 6.9B (0-shot) | 60.9 | No | Pythia: A Suite for Analyzing Large Language Mod... | 2023-04-03 | Code |
| 50 | GPT-NeoX (one-shot) | 60.6 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 51 | FLAN-T5-Large 783M | 59.9 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 52 | Pythia 2.8B (0-shot) | 59.4 | No | Pythia: A Suite for Analyzing Large Language Mod... | 2023-04-03 | Code |
| 53 | RoBERTa-DPR 355M (0-shot) | 58.9 | No | WinoGrande: An Adversarial Winograd Schema Chall... | 2019-07-24 | Code |
| 54 | ALBERT-xxlarge 235M | 58.7 | No | Back to Square One: Artifact Detection, Training... | 2021-04-16 | - |
| 55 | Flipped-3B | 58.56 | No | Guess the Instruction! Flipped Learning Makes La... | 2022-10-06 | Code |
| 56 | GPT-2-XL 1.5B | 58.3 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 57 | T0-3B (CoT fine-tuned) | 57.5 | No | The CoT Collection: Improving Zero-shot and Few-... | 2023-05-23 | Code |
| 58 | GPT-3 Large 760M (0-shot) | 57.4 | No | Language Models are Few-Shot Learners | 2020-05-28 | Code |
| 59 | RoBERTa-base 125M | 56.3 | No | Back to Square One: Artifact Detection, Training... | 2021-04-16 | - |
| 60 | LaMini-F-T5 783M | 56 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 61 | LaMini-GPT 1.5B | 56 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 62 | BERT-large 345M | 55.6 | No | Back to Square One: Artifact Detection, Training... | 2021-04-16 | - |
| 63 | KiC-770M | 55.3 | No | Knowledge-in-Context: Towards Knowledgeable Semi... | 2022-10-28 | - |
| 64 | T5-Large 738M | 55.2 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 65 | LaMini-T5 738M | 54.9 | No | LaMini-LM: A Diverse Herd of Distilled Models fr... | 2023-04-27 | Code |
| 66 | RoBERTa-large 355M | 54.9 | No | Back to Square One: Artifact Detection, Training... | 2021-04-16 | - |
| 67 | sMLP – deterministic 9.4B (0-shot) | 54.3 | No | Efficient Language Modeling with Sparse all-MLP | 2022-03-14 | - |
| 68 | Switch Transformer 9B (0-shot) | 53.4 | No | Efficient Language Modeling with Sparse all-MLP | 2022-03-14 | - |
| 69 | BERT-base 110M | 53.1 | No | Back to Square One: Artifact Detection, Training... | 2021-04-16 | - |
| 70 | ALBERT-base 11M | 52.8 | No | Back to Square One: Artifact Detection, Training... | 2021-04-16 | - |
| 71 | BERT-large 345M (0-shot) | 51.9 | No | WinoGrande: An Adversarial Winograd Schema Chall... | 2019-07-24 | Code |
| 72 | HASH Layers 10B (0-shot) | 51.7 | No | Efficient Language Modeling with Sparse all-MLP | 2022-03-14 | - |
| 73 | Gshard 9B (0-shot) | 51.1 | No | Efficient Language Modeling with Sparse all-MLP | 2022-03-14 | - |
| 74 | Base Layers 10B (0-shot) | 51 | No | Efficient Language Modeling with Sparse all-MLP | 2022-03-14 | - |
| 75 | BERT-DPR 345M (0-shot) | 51 | No | WinoGrande: An Adversarial Winograd Schema Chall... | 2019-07-24 | Code |
| 76 | Random baseline | 50 | No | Back to Square One: Artifact Detection, Training... | 2021-04-16 | - |
| 77 | RoBERTa-large 355M (0-shot) | 50 | No | WinoGrande: An Adversarial Winograd Schema Chall... | 2019-07-24 | Code |