Metric: F1 (higher is better)
| # | Model↕ | F1▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | T5-3B | 92.5 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 2 | T5-Large | 92.4 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 3 | T5-11B | 91.9 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 4 | MT-DNN-SMART | 91.7 | No | SMART: Robust and Efficient Fine-Tuning for Pre-... | 2019-11-08 | Code |
| 5 | BigBird | 91.5 | No | Big Bird: Transformers for Longer Sequences | 2020-07-28 | Code |
| 6 | Charformer-Tall | 91.4 | No | Charformer: Fast Character Transformers via Grad... | 2021-06-23 | Code |
| 7 | RoBERTa-large 355M + Entailment as Few-shot Learner | 91 | No | Entailment as Few-Shot Learner | 2021-04-29 | Code |
| 8 | T5-Base | 90.7 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 9 | T5-Small | 89.7 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 10 | BERT-LARGE | 89.3 | No | BERT: Pre-training of Deep Bidirectional Transfo... | 2018-10-11 | Code |