Math Word Problem Solving on SVAMP

Metric: Execution Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	Execution Accuracy▼	Extra Data	Paper	Date↕	Code
1	GPT-4 (Teaching-Inspired)	93.9	No	Teaching-Inspired Integrated Prompting Framework...	2024-10-10	Code
2	GPT-4 (Model Selection)	93.7	No	Automatic Model Selection with Large Language Mo...	2023-05-23	Code
3	Qwen2(CoT + Code Interpreter)	92.3	No	-	-	-
4	GPT-4 (PHP)	91.9	No	Progressive-Hint Prompting Improves Reasoning in...	2023-04-19	Code
5	OpenMath-CodeLlama-70B (w/ code)	87.8	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
6	MathCoder-L-70B	84.9	Yes	MathCoder: Seamless Code Integration in LLMs for...	2023-10-05	Code
7	PoT_Eng (self-consistency @ 5)	83.7	No	-	-	Code
8	CoT_Eng (self-consistency @ 5)	82.5	No	-	-	Code
9	MMOS-CODE-34B(0-shot)	80.6	Yes	An Empirical Study of Data Ability Boundary in L...	2024-02-23	Code
10	MMOS-DeepSeekMath-7B(0-shot)	79.3	Yes	An Empirical Study of Data Ability Boundary in L...	2024-02-23	Code
11	MMOS-CODE-7B(0-shot)	76.4	Yes	An Empirical Study of Data Ability Boundary in L...	2024-02-23	Code
12	LLaMA 2-Chat	69.2	No	Llama 2: Open Foundation and Fine-Tuned Chat Mod...	2023-07-18	Code
13	DeBERTa	63.5	No	Math Word Problem Solving by Generating Linguist...	2023-06-24	Code
14	PaLM (zero-shot, CoT)	62.1	No	Large Language Models are Zero-Shot Reasoners	2022-05-24	Code
15	PaLM (zero-shot)	58.8	No	Large Language Models are Zero-Shot Reasoners	2022-05-24	Code
16	SYRELM (Vicuna 13B)	56.65	Yes	Frugal LMs Trained to Invoke Symbolic Solvers Ac...	2023-12-09	Code
17	ATHENA (roberta-large)	54.8	No	ATHENA: Mathematical Reasoning with Thought Expa...	2023-11-02	Code
18	MsAT-DeductReasoner	48.9	No	Learning Multi-Step Reasoning by Solving Arithme...	2023-06-02	Code
19	Roberta-DeductReasoner	47.3	No	Learning to Reason Deductively: Math Word Proble...	2022-03-19	Code
20	ATHENA (roberta-base)	45.6	No	ATHENA: Mathematical Reasoning with Thought Expa...	2023-11-02	Code
21	Graph2Tree with RoBERTa	43.8	Yes	Are NLP Models really able to Solve Simple Math ...	2021-03-12	Code
22	GTS with RoBERTa	41	Yes	Are NLP Models really able to Solve Simple Math ...	2021-03-12	Code
23	LSTM Seq2Seq with RoBERTa	40.3	Yes	Are NLP Models really able to Solve Simple Math ...	2021-03-12	Code
24	SYRELM (GPT-J)	40.1	Yes	Frugal LMs Trained to Invoke Symbolic Solvers Ac...	2023-12-09	Code
25	Transformer with RoBERTa	38.9	Yes	Are NLP Models really able to Solve Simple Math ...	2021-03-12	Code

#1GPT-4 (Teaching-Inspired)SOTA
93.9
Execution Accuracy· 2024-10-10
Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models Code
#2GPT-4 (Model Selection)SOTA
93.7
Execution Accuracy· 2023-05-23
Automatic Model Selection with Large Language Models for Reasoning Code
#3Qwen2(CoT + Code Interpreter)
92.3
Execution Accuracy
No paper
#4GPT-4 (PHP)SOTA
91.9
Execution Accuracy· 2023-04-19
Progressive-Hint Prompting Improves Reasoning in Large Language Models Code
#5OpenMath-CodeLlama-70B (w/ code)
87.8
Execution Accuracy· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#6MathCoder-L-70B
84.9
Execution Accuracy· Extra Data· 2023-10-05
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning Code
#7PoT_Eng (self-consistency @ 5)
83.7
Execution Accuracy
No paperCode
#8CoT_Eng (self-consistency @ 5)
82.5
Execution Accuracy
No paperCode
#9MMOS-CODE-34B(0-shot)
80.6
Execution Accuracy· Extra Data· 2024-02-23
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning Code
#10MMOS-DeepSeekMath-7B(0-shot)
79.3
Execution Accuracy· Extra Data· 2024-02-23
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning Code
#11MMOS-CODE-7B(0-shot)
76.4
Execution Accuracy· Extra Data· 2024-02-23
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning Code
#12LLaMA 2-Chat
69.2
Execution Accuracy· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models Code
#13DeBERTa
63.5
Execution Accuracy· 2023-06-24
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements Code
#14PaLM (zero-shot, CoT)SOTA
62.1
Execution Accuracy· 2022-05-24
Large Language Models are Zero-Shot Reasoners Code
#15PaLM (zero-shot)
58.8
Execution Accuracy· 2022-05-24
Large Language Models are Zero-Shot Reasoners Code
#16SYRELM (Vicuna 13B)
56.65
Execution Accuracy· Extra Data· 2023-12-09
Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning Code
#17ATHENA (roberta-large)
54.8
Execution Accuracy· 2023-11-02
ATHENA: Mathematical Reasoning with Thought Expansion Code
#18MsAT-DeductReasoner
48.9
Execution Accuracy· 2023-06-02
Learning Multi-Step Reasoning by Solving Arithmetic Tasks Code
#19Roberta-DeductReasonerSOTA
47.3
Execution Accuracy· 2022-03-19
Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction Code
#20ATHENA (roberta-base)
45.6
Execution Accuracy· 2023-11-02
ATHENA: Mathematical Reasoning with Thought Expansion Code
#21Graph2Tree with RoBERTaSOTA
43.8
Execution Accuracy· Extra Data· 2021-03-12
Are NLP Models really able to Solve Simple Math Word Problems?Code
#22GTS with RoBERTa
41
Execution Accuracy· Extra Data· 2021-03-12
Are NLP Models really able to Solve Simple Math Word Problems?Code
#23LSTM Seq2Seq with RoBERTa
40.3
Execution Accuracy· Extra Data· 2021-03-12
Are NLP Models really able to Solve Simple Math Word Problems?Code
#24SYRELM (GPT-J)
40.1
Execution Accuracy· Extra Data· 2023-12-09
Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning Code
#25Transformer with RoBERTa
38.9
Execution Accuracy· Extra Data· 2021-03-12
Are NLP Models really able to Solve Simple Math Word Problems?Code