Mathematical Question Answering on SVAMP (1:N)

Metric: Execution Accuracy (higher is better)

LeaderboardDataset
Loading chart...
#ModelExecution AccuracyExtra DataPaperDateCode
1ATHENA (roberta-large)67.8NoATHENA: Mathematical Reasoning with Thought Expa...2023-11-02Code
2ATHENA (roberta-base)52.5NoATHENA: Mathematical Reasoning with Thought Expa...2023-11-02Code