Mathematical Reasoning on ASDiv-A

Metric: Execution Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Hide extra data

Sort:

#	Model↕	Execution Accuracy▼	Extra Data	Paper	Date↕	Code
1	ATHENA (roberta-large)	91	No	ATHENA: Mathematical Reasoning with Thought Expa...	2023-11-02	Code
2	MMOS-DeepSeekMath-7B(0-shot)	87.6	Yes	An Empirical Study of Data Ability Boundary in L...	2024-02-23	Code
3	ATHENA (roberta-base)	86.4	No	ATHENA: Mathematical Reasoning with Thought Expa...	2023-11-02	Code
4	MMOS-CODE-34B(0-shot)	85.1	Yes	An Empirical Study of Data Ability Boundary in L...	2024-02-23	Code
5	OpenMath-CodeLlama-70B (w/ code)	84.7	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
6	Graph2Tree with RoBERTa	82.2	No	Are NLP Models really able to Solve Simple Math ...	2021-03-12	Code
7	GTS with RoBERTa	81.2	No	Are NLP Models really able to Solve Simple Math ...	2021-03-12	Code
8	MMOS-CODE-7B(0-shot)	78.6	Yes	An Empirical Study of Data Ability Boundary in L...	2024-02-23	Code
9	LSTM Seq2Seq with RoBERTa	76.9	No	Are NLP Models really able to Solve Simple Math ...	2021-03-12	Code

#1ATHENA (roberta-large)SOTA
91
Execution Accuracy· 2023-11-02
ATHENA: Mathematical Reasoning with Thought Expansion Code
#2MMOS-DeepSeekMath-7B(0-shot)
87.6
Execution Accuracy· Extra Data· 2024-02-23
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning Code
#3ATHENA (roberta-base)
86.4
Execution Accuracy· 2023-11-02
ATHENA: Mathematical Reasoning with Thought Expansion Code
#4MMOS-CODE-34B(0-shot)
85.1
Execution Accuracy· Extra Data· 2024-02-23
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning Code
#5OpenMath-CodeLlama-70B (w/ code)
84.7
Execution Accuracy· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#6Graph2Tree with RoBERTaSOTA
82.2
Execution Accuracy· 2021-03-12
Are NLP Models really able to Solve Simple Math Word Problems?Code
#7GTS with RoBERTa
81.2
Execution Accuracy· 2021-03-12
Are NLP Models really able to Solve Simple Math Word Problems?Code
#8MMOS-CODE-7B(0-shot)
78.6
Execution Accuracy· Extra Data· 2024-02-23
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning Code
#9LSTM Seq2Seq with RoBERTa
76.9
Execution Accuracy· 2021-03-12
Are NLP Models really able to Solve Simple Math Word Problems?Code