MWPToolkit: An Open-Source Framework for Deep Learning-Based Math Word Problem Solvers

Yihuai Lan, Lei Wang, Qiyuan Zhang, Yunshi Lan, Bing Tian Dai, Yan Wang, Dongxiang Zhang, Ee-Peng Lim

2021-09-02Math Math Word Problem Solving

Abstract

Developing automatic Math Word Problem (MWP) solvers has been an interest of NLP researchers since the 1960s. Over the last few years, there are a growing number of datasets and deep learning-based methods proposed for effectively solving MWPs. However, most existing methods are benchmarked soly on one or two datasets, varying in different configurations, which leads to a lack of unified, standardized, fair, and comprehensive comparison between methods. This paper presents MWPToolkit, the first open-source framework for solving MWPs. In MWPToolkit, we decompose the procedure of existing MWP solvers into multiple core components and decouple their models into highly reusable modules. We also provide a hyper-parameter search function to boost the performance. In total, we implement and compare 17 MWP solvers on 4 widely-used single equation generation benchmarks and 2 multiple equations generation benchmarks. These features enable our MWPToolkit to be suitable for researchers to reproduce advanced baseline models and develop new MWP solvers quickly. Code and documents are available at https://github.com/LYH-YF/MWPToolkit.

Results

Task	Dataset	Metric	Value	Model
Question Answering	Math23K	Accuracy (5-fold)	76.6	RoBERTaGen
Math Word Problem Solving	Math23K	Accuracy (5-fold)	76.6	RoBERTaGen
Mathematical Question Answering	Math23K	Accuracy (5-fold)	76.6	RoBERTaGen
Mathematical Reasoning	Math23K	Accuracy (5-fold)	76.6	RoBERTaGen

Related Papers

VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17 QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17 Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16 Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding2025-07-15 Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing2025-07-15 Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination2025-07-14 A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning2025-07-11 Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs2025-07-10