Mathematical Reasoning on MAWPS

Metric: Accuracy (%) (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	Accuracy (%)▼	Extra Data	Paper	Date↕	Code
1	OpenMath-CodeLlama-70B (w/ code)	95.7	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
2	MsAT-DeductReasoner	94.3	No	Learning Multi-Step Reasoning by Solving Arithme...	2023-06-02	Code
3	ATHENA (roberta-large)	93	No	ATHENA: Mathematical Reasoning with Thought Expa...	2023-11-02	Code
4	Multi-view	92.3	Yes	Multi-View Reasoning: Consistent Contrastive Lea...	2022-10-21	Code
5	Exp-Tree	92.3	No	An Expression Tree Decoding Strategy for Mathema...	2023-10-14	Code
6	ATHENA (roberta-base)	92.2	No	ATHENA: Mathematical Reasoning with Thought Expa...	2023-11-02	Code
7	Roberta-DeductReasoner	92	No	Learning to Reason Deductively: Math Word Proble...	2022-03-19	Code
8	DeBERTa (PM + VM)	91	Yes	Math Word Problem Solving by Generating Linguist...	2023-06-24	Code
9	EPT	88.7	No	-	-	Code
10	Graph2Tree with RoBERTa	88.7	No	Are NLP Models really able to Solve Simple Math ...	2021-03-12	Code
11	GTS with RoBERTa	88.5	No	Are NLP Models really able to Solve Simple Math ...	2021-03-12	Code
12	GEO	85.1	No	-	-	-
13	EPT-X	84.57	No	-	-	Code
14	EPT	84.51	No	-	-	Code
15	Graph2Tree	83.7	No	-	-	Code
16	LLaMA 2-Chat	82.4	No	Llama 2: Open Foundation and Fine-Tuned Chat Mod...	2023-07-18	Code
17	GPT-3.5 turbo (175B)	80.3	No	Math Word Problem Solving by Generating Linguist...	2023-06-24	Code
18	Toolformer	44	No	-	-	-
19	GPT-3 (175B)	19.8	No	-	-	-
20	Toolformer (disabled)	15	No	-	-	-
21	GPT-J	9.9	No	Math Word Problem Solving by Generating Linguist...	2023-06-24	Code
22	GPT-J + CC	9.3	No	-	-	-
23	OPT (66B)	7.9	No	-	-	-
24	GPT-3 text-curie-001 (13B)	4.09	No	Math Word Problem Solving by Generating Linguist...	2023-06-24	Code
25	GPT-3 text-babbage-001 (6.7B)	2.76	No	Math Word Problem Solving by Generating Linguist...	2023-06-24	Code

#1OpenMath-CodeLlama-70B (w/ code)SOTA
95.7
Accuracy (%)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#2MsAT-DeductReasonerSOTA
94.3
Accuracy (%)· 2023-06-02
Learning Multi-Step Reasoning by Solving Arithmetic Tasks Code
#3ATHENA (roberta-large)
93
Accuracy (%)· 2023-11-02
ATHENA: Mathematical Reasoning with Thought Expansion Code
#4Multi-viewSOTA
92.3
Accuracy (%)· Extra Data· 2022-10-21
Multi-View Reasoning: Consistent Contrastive Learning for Math Word Problem Code
#5Exp-Tree
92.3
Accuracy (%)· 2023-10-14
An Expression Tree Decoding Strategy for Mathematical Equation Generation Code
#6ATHENA (roberta-base)
92.2
Accuracy (%)· 2023-11-02
ATHENA: Mathematical Reasoning with Thought Expansion Code
#7Roberta-DeductReasonerSOTA
92
Accuracy (%)· 2022-03-19
Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction Code
#8DeBERTa (PM + VM)
91
Accuracy (%)· Extra Data· 2023-06-24
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements Code
#9EPT
88.7
Accuracy (%)
No paperCode
#10Graph2Tree with RoBERTaSOTA
88.7
Accuracy (%)· 2021-03-12
Are NLP Models really able to Solve Simple Math Word Problems?Code
#11GTS with RoBERTa
88.5
Accuracy (%)· 2021-03-12
Are NLP Models really able to Solve Simple Math Word Problems?Code
#12GEO
85.1
Accuracy (%)
No paper
#13EPT-X
84.57
Accuracy (%)
No paperCode
#14EPT
84.51
Accuracy (%)
No paperCode
#15Graph2Tree
83.7
Accuracy (%)
No paperCode
#16LLaMA 2-Chat
82.4
Accuracy (%)· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models Code
#17GPT-3.5 turbo (175B)
80.3
Accuracy (%)· 2023-06-24
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements Code
#18Toolformer
44
Accuracy (%)
No paper
#19GPT-3 (175B)
19.8
Accuracy (%)
No paper
#20Toolformer (disabled)
15
Accuracy (%)
No paper
#21GPT-J
9.9
Accuracy (%)· 2023-06-24
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements Code
#22GPT-J + CC
9.3
Accuracy (%)
No paper
#23OPT (66B)
7.9
Accuracy (%)
No paper
#24GPT-3 text-curie-001 (13B)
4.09
Accuracy (%)· 2023-06-24
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements Code
#25GPT-3 text-babbage-001 (6.7B)
2.76
Accuracy (%)· 2023-06-24
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements Code