TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Frugal LMs Trained to Invoke Symbolic Solvers Achieve Para...

Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning

Subhabrata Dutta, Joykirat Singh, Ishan Pandey, Sunny Manchanda, Soumen Chakrabarti, Tanmoy Chakraborty

2023-12-09Mathematical ReasoningMath Word Problem SolvingArithmetic Reasoning
PaperPDFCode(official)

Abstract

Large Language Models (LLM) exhibit zero-shot mathematical reasoning capacity as a behavior emergent with scale, commonly manifesting as chain-of-thoughts (CoT) reasoning. However, multiple empirical findings suggest that this prowess is exclusive to LLMs with exorbitant sizes (beyond 50 billion parameters). Meanwhile, educational neuroscientists suggest that symbolic algebraic manipulation be introduced around the same time as arithmetic word problems to modularize language-to-formulation, symbolic manipulation of the formulation, and endgame arithmetic. In this paper, we start with the hypothesis that much smaller LMs, which are weak at multi-step reasoning, can achieve reasonable arithmetic reasoning if arithmetic word problems are posed as a formalize-then-solve task. In our architecture, which we call SYRELM, the LM serves the role of a translator to map natural language arithmetic questions into a formal language (FL) description. A symbolic solver then evaluates the FL expression to obtain the answer. A small frozen LM, equipped with an efficient low-rank adapter, is capable of generating FL expressions that incorporate natural language descriptions of the arithmetic problem (e.g., variable names and their purposes, formal expressions combining variables, etc.). We adopt policy-gradient reinforcement learning to train the adapted LM, informed by the non-differentiable symbolic solver. This marks a sharp departure from the recent development in tool-augmented LLMs, in which the external tools (e.g., calculator, Web search, etc.) are essentially detached from the learning phase of the LM. SYRELM shows massive improvements (e.g., +30.65 absolute point improvement in accuracy on the SVAMP dataset using GPT-J 6B model) over base LMs, while keeping our testbed easy to diagnose, interpret and within reach of most researchers.

Results

TaskDatasetMetricValueModel
Question AnsweringSVAMPExecution Accuracy56.65SYRELM (Vicuna 13B)
Question AnsweringSVAMPExecution Accuracy40.1SYRELM (GPT-J)
Math Word Problem SolvingSVAMPExecution Accuracy56.65SYRELM (Vicuna 13B)
Math Word Problem SolvingSVAMPExecution Accuracy40.1SYRELM (GPT-J)
Mathematical Question AnsweringSVAMPExecution Accuracy56.65SYRELM (Vicuna 13B)
Mathematical Question AnsweringSVAMPExecution Accuracy40.1SYRELM (GPT-J)
Mathematical ReasoningSVAMPExecution Accuracy56.65SYRELM (Vicuna 13B)
Mathematical ReasoningSVAMPExecution Accuracy40.1SYRELM (GPT-J)
Arithmetic ReasoningGSM8KAccuracy35.2Vicuna (SYRELM)
Arithmetic ReasoningGSM8KParameters (Billion)13Vicuna (SYRELM)

Related Papers

VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17A Survey of Deep Learning for Geometry Problem Solving2025-07-16KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?2025-07-15DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination2025-07-14A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning2025-07-11Integrating External Tools with Large Language Models to Improve Accuracy2025-07-09Skywork-R1V3 Technical Report2025-07-08