| 1 | Mistral-Nemo 12B (HPT) | 99.87 | No | Hierarchical Prompting Taxonomy: A Universal Eva... | 2024-06-18 | Code |
| 2 | ST-MoE-32B 269B (fine-tuned) | 92.4 | No | ST-MoE: Designing Stable and Transferable Sparse... | 2022-02-17 | Code |
| 3 | PaLM 540B (fine-tuned) | 92.2 | No | PaLM: Scaling Language Modeling with Pathways | 2022-04-05 | Code |
| 4 | Turing NLR v5 XXL 5.4B (fine-tuned) | 92 | No | Toward Efficient Language Model Pretraining and ... | 2022-12-04 | - |
| 5 | T5-XXL 11B (fine-tuned) | 91.2 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 6 | PaLM 2-L (1-shot) | 90.9 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 7 | UL2 20B (fine-tuned) | 90.8 | No | UL2: Unifying Language Learning Paradigms | 2022-05-10 | Code |
| 8 | Vega v2 6B (fine-tuned) | 90.5 | No | Toward Efficient Language Model Pretraining and ... | 2022-12-04 | - |
| 9 | DeBERTa-1.5B | 90.4 | No | DeBERTa: Decoding-enhanced BERT with Disentangle... | 2020-06-05 | Code |
| 10 | PaLM 2-M (1-shot) | 88.6 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 11 | ST-MoE-L 4.1B (fine-tuned) | 88.6 | No | ST-MoE: Designing Stable and Transferable Sparse... | 2022-02-17 | Code |
| 12 | PaLM 2-S (1-shot) | 88.1 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 13 | MUPPET Roberta Large | 87.5 | No | Muppet: Massive Multi-task Representations with ... | 2021-01-26 | Code |
| 14 | FLAN 137B (prompt-tuned) | 86.3 | No | Finetuned Language Models Are Zero-Shot Learners | 2021-09-03 | Code |
| 15 | RoBERTa-large 355M + Entailment as Few-shot Learner | 86 | No | Entailment as Few-Shot Learner | 2021-04-29 | Code |
| 16 | T5-Large 770M (fine-tuned) | 85.4 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 17 | LLaMA 65B (0-shot) | 85.3 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 18 | LLaMA 2 70B (0-shot) | 85 | No | Llama 2: Open Foundation and Fine-Tuned Chat Mod... | 2023-07-18 | Code |
| 19 | FLAN 137B (4-shot) | 84.6 | No | Finetuned Language Models Are Zero-Shot Learners | 2021-09-03 | Code |
| 20 | MUPPET Roberta Base | 83.8 | No | Muppet: Massive Multi-task Representations with ... | 2021-01-26 | Code |
| 21 | Chinchilla 70B (0-shot) | 83.7 | No | Training Compute-Optimal Large Language Models | 2022-03-29 | Code |
| 22 | LLaMA 2 34B (0-shot) | 83.7 | No | Llama 2: Open Foundation and Fine-Tuned Chat Mod... | 2023-07-18 | Code |
| 23 | LLaMA 33B (0-shot) | 83.1 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 24 | FLAN 137B (0-shot) | 82.9 | No | Finetuned Language Models Are Zero-Shot Learners | 2021-09-03 | Code |
| 25 | LLaMA 2 13B (0-shot) | 81.7 | No | Llama 2: Open Foundation and Fine-Tuned Chat Mod... | 2023-07-18 | Code |
| 26 | T5-Base 220M (fine-tuned) | 81.4 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 27 | BERT-MultiNLI 340M (fine-tuned) | 80.4 | No | BoolQ: Exploring the Surprising Difficulty of Na... | 2019-05-24 | Code |
| 28 | Gopher (zero-shot) | 79.3 | No | Scaling Language Models: Methods, Analysis & Ins... | 2021-12-08 | Code |
| 29 | LLaMA 13B (zero-shot) | 78.1 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 30 | LLaMA 2 7B (zero-shot) | 77.4 | No | Llama 2: Open Foundation and Fine-Tuned Chat Mod... | 2023-07-18 | Code |
| 31 | LLaMA-2 13B + MixLoRA | 77.1 | No | MixLoRA: Enhancing Large Language Models Fine-Tu... | 2024-04-22 | Code |
| 32 | LLaMA 7B (zero-shot) | 76.5 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 33 | T5-Small 60M (fine-tuned) | 76.4 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 34 | GPT-3 175B (few-shot, k=32) | 76.4 | No | Language Models are Few-Shot Learners | 2020-05-28 | Code |
| 35 | BiDAF-MultiNLI (fine-tuned) | 75.57 | No | BoolQ: Exploring the Surprising Difficulty of Na... | 2019-05-24 | Code |
| 36 | LLaMA-3 8B + MixLoRA | 75 | No | MixLoRA: Enhancing Large Language Models Fine-Tu... | 2024-04-22 | Code |
| 37 | Bloomberg GPT 50B (1-shot) | 74.6 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 38 | LLaMA3+MoSLoRA | 74.6 | No | Mixture-of-Subspaces in Low-Rank Adaptation | 2024-06-16 | Code |
| 39 | GPT-1 117M (fine-tuned) | 72.87 | No | BoolQ: Exploring the Surprising Difficulty of Na... | 2019-05-24 | Code |
| 40 | LLaMA-2 7B + MixLoRA | 72.7 | No | MixLoRA: Enhancing Large Language Models Fine-Tu... | 2024-04-22 | Code |
| 41 | BiDAF + ELMo (fine-tuned) | 71.41 | No | BoolQ: Exploring the Surprising Difficulty of Na... | 2019-05-24 | Code |
| 42 | OPT-IML 175B | 71.4 | No | OPT-IML: Scaling Language Model Instruction Meta... | 2022-12-22 | Code |
| 43 | AlexaTM 20B | 69.4 | No | AlexaTM 20B: Few-Shot Learning Using a Large-Sca... | 2022-08-02 | Code |
| 44 | Neo-6B (QA + WS) | 67.2 | No | Ask Me Anything: A simple strategy for prompting... | 2022-10-05 | Code |
| 45 | OPT-IML 30B | 66.9 | No | OPT-IML: Scaling Language Model Instruction Meta... | 2022-12-22 | Code |
| 46 | Neo-6B (few-shot) | 66.5 | No | Ask Me Anything: A simple strategy for prompting... | 2022-10-05 | Code |
| 47 | N-Grammer 343M | 65 | No | N-Grammer: Augmenting Transformers with latent n... | 2022-07-13 | Code |
| 48 | Neo-6B (QA) | 64.9 | No | Ask Me Anything: A simple strategy for prompting... | 2022-10-05 | Code |
| 49 | OPT 30B (0-shot) | 64 | No | OPT-IML: Scaling Language Model Instruction Meta... | 2022-12-22 | Code |
| 50 | UL2 20B (0-shot) | 63.1 | No | UL2: Unifying Language Learning Paradigms | 2022-05-10 | Code |
| 51 | Majority baseline | 62.17 | No | BoolQ: Exploring the Surprising Difficulty of Na... | 2019-05-24 | Code |
| 52 | Hybrid H3 1.3B (0-shot, logit scoring) | 61.7 | No | Hungry Hungry Hippos: Towards Language Modeling ... | 2022-12-28 | Code |
| 53 | OPT-IML 1.3B (0-shot) | 61.5 | No | OPT-IML: Scaling Language Model Instruction Meta... | 2022-12-22 | Code |
| 54 | Shakti-LLM (2.5B) | 61.1 | No | SHAKTI: A 2.5 Billion Parameter Small Language M... | 2024-10-15 | - |
| 55 | Hybrid H3 2.7B (3-shot, logit scoring) | 60.6 | No | Hungry Hungry Hippos: Towards Language Modeling ... | 2022-12-28 | Code |
| 56 | OPT 1.3B (zero-shot) | 60.5 | No | OPT-IML: Scaling Language Model Instruction Meta... | 2022-12-22 | Code |
| 57 | GPT-3 75B (0-shot) | 60.5 | No | Language Models are Few-Shot Learners | 2020-05-28 | Code |
| 58 | OPT 175B | 60.1 | No | OPT-IML: Scaling Language Model Instruction Meta... | 2022-12-22 | Code |
| 59 | Hybrid H3 125M (0-shot, logit scoring) | 59.6 | No | Hungry Hungry Hippos: Towards Language Modeling ... | 2022-12-28 | Code |
| 60 | OPT 66B (1-shot) | 57.5 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 61 | Hybrid H3 125M (3-shot, logit scoring) | 56.1 | No | Hungry Hungry Hippos: Towards Language Modeling ... | 2022-12-28 | Code |
| 62 | Hybrid H3 125M (3-shot, rank classification) | 56.1 | No | Hungry Hungry Hippos: Towards Language Modeling ... | 2022-12-28 | Code |
| 63 | BLOOM 176B (1-shot) | 52.9 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 64 | Hyena | 51.8 | No | Hyena Hierarchy: Towards Larger Convolutional La... | 2023-02-21 | Code |
| 65 | GPT-NeoX 20B (1-shot) | 46.4 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |