Mathematical Reasoning on FrontierMath
Metric: Accuracy (higher is better)
LeaderboardDataset
Loading chart...
Results
Submit a result| # | Model↕ | Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | o3 | 0.252 | No | - | - | - |
| 2 | Gemini 1.5 Pro (002) | 0.02 | No | FrontierMath: A Benchmark for Evaluating Advance... | 2024-11-07 | - |
| 3 | Claude 3.5 Sonnet | 0.01 | No | - | - | - |
| 4 | o1-preview | 0.01 | No | - | - | - |
| 5 | o1-mini | 0.01 | No | - | - | - |
| 6 | GPT-4o | 0.01 | No | - | - | - |