Mathematical Reasoning on FrontierMath

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...
#ModelAccuracyExtra DataPaperDateCode
1o30.252No---
2Gemini 1.5 Pro (002)0.02NoFrontierMath: A Benchmark for Evaluating Advance...2024-11-07-
3Claude 3.5 Sonnet0.01No---
4o1-preview0.01No---
5o1-mini0.01No---
6GPT-4o0.01No---