Metric: Execution Accuracy (higher is better)
| # | Model↕ | Execution Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | ATHENA (roberta-large) | 91 | No | ATHENA: Mathematical Reasoning with Thought Expa... | 2023-11-02 | Code |
| 2 | MMOS-DeepSeekMath-7B(0-shot) | 87.6 | Yes | An Empirical Study of Data Ability Boundary in L... | 2024-02-23 | Code |
| 3 | ATHENA (roberta-base) | 86.4 | No | ATHENA: Mathematical Reasoning with Thought Expa... | 2023-11-02 | Code |
| 4 | MMOS-CODE-34B(0-shot) | 85.1 | Yes | An Empirical Study of Data Ability Boundary in L... | 2024-02-23 | Code |
| 5 | OpenMath-CodeLlama-70B (w/ code) | 84.7 | Yes | OpenMathInstruct-1: A 1.8 Million Math Instructi... | 2024-02-15 | Code |
| 6 | Graph2Tree with RoBERTa | 82.2 | No | Are NLP Models really able to Solve Simple Math ... | 2021-03-12 | Code |
| 7 | GTS with RoBERTa | 81.2 | No | Are NLP Models really able to Solve Simple Math ... | 2021-03-12 | Code |
| 8 | MMOS-CODE-7B(0-shot) | 78.6 | Yes | An Empirical Study of Data Ability Boundary in L... | 2024-02-23 | Code |
| 9 | LSTM Seq2Seq with RoBERTa | 76.9 | No | Are NLP Models really able to Solve Simple Math ... | 2021-03-12 | Code |