Metric: Prometheus-2 Answer Correctness (lower is better)
| # | Model↕ | Prometheus-2 Answer Correctness▲ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GPT-3.5-Turbo-0613-16k | 3.0408 | No | Language Models are Few-Shot Learners | 2020-05-28 | Code |
| 2 | Command-R-v01-34B | 3.0571 | No | - | - | - |
| 3 | Llama-3-IT-8B-8k | 3.1102 | No | The Llama 3 Herd of Models | 2024-07-31 | Code |
| 4 | Llama-3-IT-8B-32k | 3.1673 | No | The Llama 3 Herd of Models | 2024-07-31 | Code |
| 5 | Mistral-v02-7B-32k | 3.4245 | No | Mistral 7B | 2023-10-10 | Code |
| 6 | GPT-4o-2024-08-06-128k | 3.4612 | No | GPT-4 Technical Report | 2023-03-15 | Code |