Metric: Best-of (higher is better)
| # | Model↕ | Best-of▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Baseline | 0.92 | No | Benchmarking Llama2, Mistral, Gemma and GPT for ... | 2024-04-15 | Code |
| 2 | GPT-4 | 0.91 | No | Benchmarking Llama2, Mistral, Gemma and GPT for ... | 2024-04-15 | Code |
| 3 | Gemma | 0.91 | No | Benchmarking Llama2, Mistral, Gemma and GPT for ... | 2024-04-15 | Code |
| 4 | Mistral | 0.87 | No | Benchmarking Llama2, Mistral, Gemma and GPT for ... | 2024-04-15 | Code |
| 5 | Llama2 | 0.86 | No | Benchmarking Llama2, Mistral, Gemma and GPT for ... | 2024-04-15 | Code |