Metric: F1 (higher is better)
| # | Model↕ | F1▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | PaLM 540B (finetuned) | 100 | No | PaLM: Scaling Language Modeling with Pathways | 2022-04-05 | Code |
| 2 | Vega v2 6B (KD-based prompt transfer) | 98.6 | No | Toward Efficient Language Model Pretraining and ... | 2022-12-04 | - |
| 3 | Turing NLR v5 XXL 5.4B (fine-tuned) | 95.9 | No | Toward Efficient Language Model Pretraining and ... | 2022-12-04 | - |
| 4 | DeBERTa-1.5B | 94.9 | No | DeBERTa: Decoding-enhanced BERT with Disentangle... | 2020-06-05 | Code |
| 5 | T5-XXL 11B (fine-tuned) | 93.9 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 6 | T5-Large 770M (fine-tuned) | 90.3 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 7 | T5-Base 220M (fine-tuned) | 86.2 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 8 | N-Grammer 343M | 59.7 | No | N-Grammer: Augmenting Transformers with latent n... | 2022-07-13 | Code |
| 9 | GPT-3 175B (few-shot, k=32) | 52 | No | Language Models are Few-Shot Learners | 2020-05-28 | Code |