Learning Multi-Step Reasoning by Solving Arithmetic Tasks

Tianduo Wang, Wei Lu

2023-06-02Mathematical Reasoning Math Math Word Problem Solving

Abstract

Mathematical reasoning is regarded as a necessary ability for Language Models (LMs). Recent works demonstrate large LMs' impressive performance in solving math problems. The success is attributed to their Chain-of-Thought (CoT) reasoning abilities, i.e., the ability to decompose complex questions into step-by-step reasoning chains, but such ability seems only to emerge from models with abundant parameters. This work investigates how to incorporate relatively small LMs with the capabilities of multi-step reasoning. We propose to inject such abilities by continually pre-training LMs on a synthetic dataset MsAT which is composed of Multi-step Arithmetic Tasks. Our experiments on four math word problem datasets show the effectiveness of the proposed method in enhancing LMs' math reasoning abilities.

Results

Task	Dataset	Metric	Value	Model
Question Answering	MAWPS	Accuracy (%)	94.3	MsAT-DeductReasoner
Question Answering	SVAMP	Execution Accuracy	48.9	MsAT-DeductReasoner
Math Word Problem Solving	MAWPS	Accuracy (%)	94.3	MsAT-DeductReasoner
Math Word Problem Solving	SVAMP	Execution Accuracy	48.9	MsAT-DeductReasoner
Mathematical Question Answering	MAWPS	Accuracy (%)	94.3	MsAT-DeductReasoner
Mathematical Question Answering	SVAMP	Execution Accuracy	48.9	MsAT-DeductReasoner
Mathematical Reasoning	MAWPS	Accuracy (%)	94.3	MsAT-DeductReasoner
Mathematical Reasoning	SVAMP	Execution Accuracy	48.9	MsAT-DeductReasoner

Related Papers

VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17 QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17 A Survey of Deep Learning for Geometry Problem Solving2025-07-16 Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16 KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?2025-07-15 Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding2025-07-15 Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing2025-07-15 Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination2025-07-14