Metric: Best-of (higher is better)
| # | Model↕ | Best-of▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GPT-4 | 0.5 | No | Benchmarking Llama2, Mistral, Gemma and GPT for ... | 2024-04-15 | Code |
| 2 | Gemma | 0.41 | No | Benchmarking Llama2, Mistral, Gemma and GPT for ... | 2024-04-15 | Code |
| 3 | Baseline | 0.41 | No | Benchmarking Llama2, Mistral, Gemma and GPT for ... | 2024-04-15 | Code |
| 4 | Mistral | 0.36 | No | Benchmarking Llama2, Mistral, Gemma and GPT for ... | 2024-04-15 | Code |
| 5 | Llama2 | 0.34 | No | Benchmarking Llama2, Mistral, Gemma and GPT for ... | 2024-04-15 | Code |