TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Math Word Problem Solving by Generating Linguistic Variant...

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

Syed Rifat Raiyan, Md. Nafis Faiyaz, Shah Md. Jawad Kabir, Mohsinul Kabir, Hasan Mahmud, Md Kamrul Hasan

2023-06-24Mathematical ReasoningMathMath Word Problem Solving
PaperPDFCode(official)

Abstract

The art of mathematical reasoning stands as a fundamental pillar of intellectual progress and is a central catalyst in cultivating human ingenuity. Researchers have recently published a plethora of works centered around the task of solving Math Word Problems (MWP) $-$ a crucial stride towards general AI. These existing models are susceptible to dependency on shallow heuristics and spurious correlations to derive the solution expressions. In order to ameliorate this issue, in this paper, we propose a framework for MWP solvers based on the generation of linguistic variants of the problem text. The approach involves solving each of the variant problems and electing the predicted expression with the majority of the votes. We use DeBERTa (Decoding-enhanced BERT with disentangled attention) as the encoder to leverage its rich textual representations and enhanced mask decoder to construct the solution expressions. Furthermore, we introduce a challenging dataset, $\mathrm{P\small{ARA}\normalsize{MAWPS}}$, consisting of paraphrased, adversarial, and inverse variants of selectively sampled MWPs from the benchmark $\mathrm{M\small{AWPS}}$ dataset. We extensively experiment on this dataset along with other benchmark datasets using some baseline MWP solver models. We show that training on linguistic variants of problem statements and voting on candidate predictions improve the mathematical reasoning and robustness of the model. We make our code and data publicly available.

Results

TaskDatasetMetricValueModel
Question AnsweringParaMAWPSAccuracy (%)79.1DeBERTa (VM)
Question AnsweringParaMAWPSAccuracy (%)73GPT-3.5 Turbo (175B)
Question AnsweringParaMAWPSAccuracy (%)5.9GPT-J (6B)
Question AnsweringParaMAWPSAccuracy (%)4.2GPT-3 text-curie-001 (13B)
Question AnsweringParaMAWPSAccuracy (%)3.21GPT-3 text-babbage-001 (6.7B)
Question AnsweringMAWPSAccuracy (%)91DeBERTa (PM + VM)
Question AnsweringMAWPSAccuracy (%)80.3GPT-3.5 turbo (175B)
Question AnsweringMAWPSAccuracy (%)9.9GPT-J
Question AnsweringMAWPSAccuracy (%)4.09GPT-3 text-curie-001 (13B)
Question AnsweringMAWPSAccuracy (%)2.76GPT-3 text-babbage-001 (6.7B)
Question AnsweringSVAMPAccuracy63.5DeBERTa
Question AnsweringSVAMPExecution Accuracy63.5DeBERTa
Math Word Problem SolvingParaMAWPSAccuracy (%)79.1DeBERTa (VM)
Math Word Problem SolvingParaMAWPSAccuracy (%)73GPT-3.5 Turbo (175B)
Math Word Problem SolvingParaMAWPSAccuracy (%)5.9GPT-J (6B)
Math Word Problem SolvingParaMAWPSAccuracy (%)4.2GPT-3 text-curie-001 (13B)
Math Word Problem SolvingParaMAWPSAccuracy (%)3.21GPT-3 text-babbage-001 (6.7B)
Math Word Problem SolvingMAWPSAccuracy (%)91DeBERTa (PM + VM)
Math Word Problem SolvingMAWPSAccuracy (%)80.3GPT-3.5 turbo (175B)
Math Word Problem SolvingMAWPSAccuracy (%)9.9GPT-J
Math Word Problem SolvingMAWPSAccuracy (%)4.09GPT-3 text-curie-001 (13B)
Math Word Problem SolvingMAWPSAccuracy (%)2.76GPT-3 text-babbage-001 (6.7B)
Math Word Problem SolvingSVAMPAccuracy63.5DeBERTa
Math Word Problem SolvingSVAMPExecution Accuracy63.5DeBERTa
Mathematical Question AnsweringParaMAWPSAccuracy (%)79.1DeBERTa (VM)
Mathematical Question AnsweringParaMAWPSAccuracy (%)73GPT-3.5 Turbo (175B)
Mathematical Question AnsweringParaMAWPSAccuracy (%)5.9GPT-J (6B)
Mathematical Question AnsweringParaMAWPSAccuracy (%)4.2GPT-3 text-curie-001 (13B)
Mathematical Question AnsweringParaMAWPSAccuracy (%)3.21GPT-3 text-babbage-001 (6.7B)
Mathematical Question AnsweringMAWPSAccuracy (%)91DeBERTa (PM + VM)
Mathematical Question AnsweringMAWPSAccuracy (%)80.3GPT-3.5 turbo (175B)
Mathematical Question AnsweringMAWPSAccuracy (%)9.9GPT-J
Mathematical Question AnsweringMAWPSAccuracy (%)4.09GPT-3 text-curie-001 (13B)
Mathematical Question AnsweringMAWPSAccuracy (%)2.76GPT-3 text-babbage-001 (6.7B)
Mathematical Question AnsweringSVAMPAccuracy63.5DeBERTa
Mathematical Question AnsweringSVAMPExecution Accuracy63.5DeBERTa
Mathematical ReasoningParaMAWPSAccuracy (%)79.1DeBERTa (VM)
Mathematical ReasoningParaMAWPSAccuracy (%)73GPT-3.5 Turbo (175B)
Mathematical ReasoningParaMAWPSAccuracy (%)5.9GPT-J (6B)
Mathematical ReasoningParaMAWPSAccuracy (%)4.2GPT-3 text-curie-001 (13B)
Mathematical ReasoningParaMAWPSAccuracy (%)3.21GPT-3 text-babbage-001 (6.7B)
Mathematical ReasoningMAWPSAccuracy (%)91DeBERTa (PM + VM)
Mathematical ReasoningMAWPSAccuracy (%)80.3GPT-3.5 turbo (175B)
Mathematical ReasoningMAWPSAccuracy (%)9.9GPT-J
Mathematical ReasoningMAWPSAccuracy (%)4.09GPT-3 text-curie-001 (13B)
Mathematical ReasoningMAWPSAccuracy (%)2.76GPT-3 text-babbage-001 (6.7B)
Mathematical ReasoningSVAMPAccuracy63.5DeBERTa
Mathematical ReasoningSVAMPExecution Accuracy63.5DeBERTa

Related Papers

VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17A Survey of Deep Learning for Geometry Problem Solving2025-07-16Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?2025-07-15Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding2025-07-15Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing2025-07-15Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination2025-07-14