Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

Syed Rifat Raiyan, Md. Nafis Faiyaz, Shah Md. Jawad Kabir, Mohsinul Kabir, Hasan Mahmud, Md Kamrul Hasan

2023-06-24Mathematical Reasoning Math Math Word Problem Solving

Abstract

The art of mathematical reasoning stands as a fundamental pillar of intellectual progress and is a central catalyst in cultivating human ingenuity. Researchers have recently published a plethora of works centered around the task of solving Math Word Problems (MWP) $-$ a crucial stride towards general AI. These existing models are susceptible to dependency on shallow heuristics and spurious correlations to derive the solution expressions. In order to ameliorate this issue, in this paper, we propose a framework for MWP solvers based on the generation of linguistic variants of the problem text. The approach involves solving each of the variant problems and electing the predicted expression with the majority of the votes. We use DeBERTa (Decoding-enhanced BERT with disentangled attention) as the encoder to leverage its rich textual representations and enhanced mask decoder to construct the solution expressions. Furthermore, we introduce a challenging dataset, $\mathrm{P\small{ARA}\normalsize{MAWPS}}$, consisting of paraphrased, adversarial, and inverse variants of selectively sampled MWPs from the benchmark $\mathrm{M\small{AWPS}}$ dataset. We extensively experiment on this dataset along with other benchmark datasets using some baseline MWP solver models. We show that training on linguistic variants of problem statements and voting on candidate predictions improve the mathematical reasoning and robustness of the model. We make our code and data publicly available.

Results

Task	Dataset	Metric	Value	Model
Question Answering	ParaMAWPS	Accuracy (%)	79.1	DeBERTa (VM)
Question Answering	ParaMAWPS	Accuracy (%)	73	GPT-3.5 Turbo (175B)
Question Answering	ParaMAWPS	Accuracy (%)	5.9	GPT-J (6B)
Question Answering	ParaMAWPS	Accuracy (%)	4.2	GPT-3 text-curie-001 (13B)
Question Answering	ParaMAWPS	Accuracy (%)	3.21	GPT-3 text-babbage-001 (6.7B)
Question Answering	MAWPS	Accuracy (%)	91	DeBERTa (PM + VM)
Question Answering	MAWPS	Accuracy (%)	80.3	GPT-3.5 turbo (175B)
Question Answering	MAWPS	Accuracy (%)	9.9	GPT-J
Question Answering	MAWPS	Accuracy (%)	4.09	GPT-3 text-curie-001 (13B)
Question Answering	MAWPS	Accuracy (%)	2.76	GPT-3 text-babbage-001 (6.7B)
Question Answering	SVAMP	Accuracy	63.5	DeBERTa
Question Answering	SVAMP	Execution Accuracy	63.5	DeBERTa
Math Word Problem Solving	ParaMAWPS	Accuracy (%)	79.1	DeBERTa (VM)
Math Word Problem Solving	ParaMAWPS	Accuracy (%)	73	GPT-3.5 Turbo (175B)
Math Word Problem Solving	ParaMAWPS	Accuracy (%)	5.9	GPT-J (6B)
Math Word Problem Solving	ParaMAWPS	Accuracy (%)	4.2	GPT-3 text-curie-001 (13B)
Math Word Problem Solving	ParaMAWPS	Accuracy (%)	3.21	GPT-3 text-babbage-001 (6.7B)
Math Word Problem Solving	MAWPS	Accuracy (%)	91	DeBERTa (PM + VM)
Math Word Problem Solving	MAWPS	Accuracy (%)	80.3	GPT-3.5 turbo (175B)
Math Word Problem Solving	MAWPS	Accuracy (%)	9.9	GPT-J
Math Word Problem Solving	MAWPS	Accuracy (%)	4.09	GPT-3 text-curie-001 (13B)
Math Word Problem Solving	MAWPS	Accuracy (%)	2.76	GPT-3 text-babbage-001 (6.7B)
Math Word Problem Solving	SVAMP	Accuracy	63.5	DeBERTa
Math Word Problem Solving	SVAMP	Execution Accuracy	63.5	DeBERTa
Mathematical Question Answering	ParaMAWPS	Accuracy (%)	79.1	DeBERTa (VM)
Mathematical Question Answering	ParaMAWPS	Accuracy (%)	73	GPT-3.5 Turbo (175B)
Mathematical Question Answering	ParaMAWPS	Accuracy (%)	5.9	GPT-J (6B)
Mathematical Question Answering	ParaMAWPS	Accuracy (%)	4.2	GPT-3 text-curie-001 (13B)
Mathematical Question Answering	ParaMAWPS	Accuracy (%)	3.21	GPT-3 text-babbage-001 (6.7B)
Mathematical Question Answering	MAWPS	Accuracy (%)	91	DeBERTa (PM + VM)
Mathematical Question Answering	MAWPS	Accuracy (%)	80.3	GPT-3.5 turbo (175B)
Mathematical Question Answering	MAWPS	Accuracy (%)	9.9	GPT-J
Mathematical Question Answering	MAWPS	Accuracy (%)	4.09	GPT-3 text-curie-001 (13B)
Mathematical Question Answering	MAWPS	Accuracy (%)	2.76	GPT-3 text-babbage-001 (6.7B)
Mathematical Question Answering	SVAMP	Accuracy	63.5	DeBERTa
Mathematical Question Answering	SVAMP	Execution Accuracy	63.5	DeBERTa
Mathematical Reasoning	ParaMAWPS	Accuracy (%)	79.1	DeBERTa (VM)
Mathematical Reasoning	ParaMAWPS	Accuracy (%)	73	GPT-3.5 Turbo (175B)
Mathematical Reasoning	ParaMAWPS	Accuracy (%)	5.9	GPT-J (6B)
Mathematical Reasoning	ParaMAWPS	Accuracy (%)	4.2	GPT-3 text-curie-001 (13B)
Mathematical Reasoning	ParaMAWPS	Accuracy (%)	3.21	GPT-3 text-babbage-001 (6.7B)
Mathematical Reasoning	MAWPS	Accuracy (%)	91	DeBERTa (PM + VM)
Mathematical Reasoning	MAWPS	Accuracy (%)	80.3	GPT-3.5 turbo (175B)
Mathematical Reasoning	MAWPS	Accuracy (%)	9.9	GPT-J
Mathematical Reasoning	MAWPS	Accuracy (%)	4.09	GPT-3 text-curie-001 (13B)
Mathematical Reasoning	MAWPS	Accuracy (%)	2.76	GPT-3 text-babbage-001 (6.7B)
Mathematical Reasoning	SVAMP	Accuracy	63.5	DeBERTa
Mathematical Reasoning	SVAMP	Execution Accuracy	63.5	DeBERTa

Abstract

Results

Task	Dataset	Metric	Value	Model
Question Answering	ParaMAWPS	Accuracy (%)	79.1	DeBERTa (VM)
Question Answering	ParaMAWPS	Accuracy (%)	73	GPT-3.5 Turbo (175B)
Question Answering	ParaMAWPS	Accuracy (%)	5.9	GPT-J (6B)
Question Answering	ParaMAWPS	Accuracy (%)	4.2	GPT-3 text-curie-001 (13B)
Question Answering	ParaMAWPS	Accuracy (%)	3.21	GPT-3 text-babbage-001 (6.7B)
Question Answering	MAWPS	Accuracy (%)	91	DeBERTa (PM + VM)
Question Answering	MAWPS	Accuracy (%)	80.3	GPT-3.5 turbo (175B)
Question Answering	MAWPS	Accuracy (%)	9.9	GPT-J
Question Answering	MAWPS	Accuracy (%)	4.09	GPT-3 text-curie-001 (13B)
Question Answering	MAWPS	Accuracy (%)	2.76	GPT-3 text-babbage-001 (6.7B)
Question Answering	SVAMP	Accuracy	63.5	DeBERTa
Question Answering	SVAMP	Execution Accuracy	63.5	DeBERTa
Math Word Problem Solving	ParaMAWPS	Accuracy (%)	79.1	DeBERTa (VM)
Math Word Problem Solving	ParaMAWPS	Accuracy (%)	73	GPT-3.5 Turbo (175B)
Math Word Problem Solving	ParaMAWPS	Accuracy (%)	5.9	GPT-J (6B)
Math Word Problem Solving	ParaMAWPS	Accuracy (%)	4.2	GPT-3 text-curie-001 (13B)
Math Word Problem Solving	ParaMAWPS	Accuracy (%)	3.21	GPT-3 text-babbage-001 (6.7B)
Math Word Problem Solving	MAWPS	Accuracy (%)	91	DeBERTa (PM + VM)
Math Word Problem Solving	MAWPS	Accuracy (%)	80.3	GPT-3.5 turbo (175B)
Math Word Problem Solving	MAWPS	Accuracy (%)	9.9	GPT-J
Math Word Problem Solving	MAWPS	Accuracy (%)	4.09	GPT-3 text-curie-001 (13B)
Math Word Problem Solving	MAWPS	Accuracy (%)	2.76	GPT-3 text-babbage-001 (6.7B)
Math Word Problem Solving	SVAMP	Accuracy	63.5	DeBERTa
Math Word Problem Solving	SVAMP	Execution Accuracy	63.5	DeBERTa
Mathematical Question Answering	ParaMAWPS	Accuracy (%)	79.1	DeBERTa (VM)
Mathematical Question Answering	ParaMAWPS	Accuracy (%)	73	GPT-3.5 Turbo (175B)
Mathematical Question Answering	ParaMAWPS	Accuracy (%)	5.9	GPT-J (6B)
Mathematical Question Answering	ParaMAWPS	Accuracy (%)	4.2	GPT-3 text-curie-001 (13B)
Mathematical Question Answering	ParaMAWPS	Accuracy (%)	3.21	GPT-3 text-babbage-001 (6.7B)
Mathematical Question Answering	MAWPS	Accuracy (%)	91	DeBERTa (PM + VM)
Mathematical Question Answering	MAWPS	Accuracy (%)	80.3	GPT-3.5 turbo (175B)
Mathematical Question Answering	MAWPS	Accuracy (%)	9.9	GPT-J
Mathematical Question Answering	MAWPS	Accuracy (%)	4.09	GPT-3 text-curie-001 (13B)
Mathematical Question Answering	MAWPS	Accuracy (%)	2.76	GPT-3 text-babbage-001 (6.7B)
Mathematical Question Answering	SVAMP	Accuracy	63.5	DeBERTa
Mathematical Question Answering	SVAMP	Execution Accuracy	63.5	DeBERTa
Mathematical Reasoning	ParaMAWPS	Accuracy (%)	79.1	DeBERTa (VM)
Mathematical Reasoning	ParaMAWPS	Accuracy (%)	73	GPT-3.5 Turbo (175B)
Mathematical Reasoning	ParaMAWPS	Accuracy (%)	5.9	GPT-J (6B)
Mathematical Reasoning	ParaMAWPS	Accuracy (%)	4.2	GPT-3 text-curie-001 (13B)
Mathematical Reasoning	ParaMAWPS	Accuracy (%)	3.21	GPT-3 text-babbage-001 (6.7B)
Mathematical Reasoning	MAWPS	Accuracy (%)	91	DeBERTa (PM + VM)
Mathematical Reasoning	MAWPS	Accuracy (%)	80.3	GPT-3.5 turbo (175B)
Mathematical Reasoning	MAWPS	Accuracy (%)	9.9	GPT-J
Mathematical Reasoning	MAWPS	Accuracy (%)	4.09	GPT-3 text-curie-001 (13B)
Mathematical Reasoning	MAWPS	Accuracy (%)	2.76	GPT-3 text-babbage-001 (6.7B)
Mathematical Reasoning	SVAMP	Accuracy	63.5	DeBERTa
Mathematical Reasoning	SVAMP	Execution Accuracy	63.5	DeBERTa

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

Abstract

Results

Related Papers

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

Abstract

Results

Related Papers