Metric: F1 (higher is better)
| # | Model↕ | F1▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | U-PaLM 62B (fine-tuned) | 88.5 | No | Transcending Scaling Laws with 0.1% Extra Compute | 2022-10-20 | - |
| 2 | ByT5 XXL | 75.3 | No | ByT5: Towards a token-free future with pre-train... | 2021-05-28 | Code |
| 3 | PaLM 2-L (one-shot) | 73.6 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 4 | PaLM 2-S (one-shot) | 73.3 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 5 | PaLM 2-M (one-shot) | 73.3 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 6 | Decoupled | 58.1 | No | Rethinking embedding coupling in pre-trained lan... | 2020-10-24 | Code |