TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ToRA: A Tool-Integrated Reasoning Agent for Mathematical P...

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen

2023-09-29Mathematical ReasoningMathMath Word Problem SolvingImitation LearningMathematical Problem-SolvingArithmetic Reasoning
PaperPDFCode(official)

Abstract

Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics. In this paper, we propose ToRA a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems by seamlessly integrating natural language reasoning with the utilization of external tools (e.g., computation libraries and symbolic solvers), thereby amalgamating the analytical prowess of language and the computational efficiency of tools. To train ToRA, we curate interactive tool-use trajectories on mathematical datasets, apply imitation learning on the annotations, and propose output space shaping to further refine models' reasoning behavior. As a result, ToRA models significantly outperform open-source models on 10 mathematical reasoning datasets across all scales with 13%-19% absolute improvements on average. Notably, ToRA-7B reaches 44.6% on the competition-level dataset MATH, surpassing the best open-source model WizardMath-70B by 22% absolute. ToRA-Code-34B is also the first open-source model that achieves an accuracy exceeding 50% on MATH, which significantly outperforms GPT-4's CoT result, and is competitive with GPT-4 solving problems with programs. Additionally, we conduct a comprehensive analysis of the benefits and remaining challenges of tool interaction for mathematical reasoning, providing valuable insights for future research.

Results

TaskDatasetMetricValueModel
Question AnsweringMATHAccuracy60ToRA-Code 34B model (w/ code, SC, k=50)
Question AnsweringMATHParameters (Billions)34ToRA-Code 34B model (w/ code, SC, k=50)
Question AnsweringMATHAccuracy56.9ToRA 70B (w/ code, SC, k=50)
Question AnsweringMATHParameters (Billions)70ToRA 70B (w/ code, SC, k=50)
Question AnsweringMATHAccuracy50.8ToRA-Code 34B (w/ code)
Question AnsweringMATHParameters (Billions)34ToRA-Code 34B (w/ code)
Question AnsweringMATHAccuracy49.7ToRA 70B (w/ code)
Question AnsweringMATHParameters (Billions)70ToRA 70B (w/ code)
Question AnsweringMATHAccuracy48.1ToRA-Code 13B (w/ code)
Question AnsweringMATHParameters (Billions)13ToRA-Code 13B (w/ code)
Question AnsweringMATHAccuracy44.6ToRA-Code 7B (w/ code)
Question AnsweringMATHParameters (Billions)7ToRA-Code 7B (w/ code)
Question AnsweringMATHAccuracy43ToRA 13B (w/ code)
Question AnsweringMATHParameters (Billions)13ToRA 13B (w/ code)
Question AnsweringMATHAccuracy40.1ToRA 7B (w/ code)
Question AnsweringMATHParameters (Billions)7ToRA 7B (w/ code)
Math Word Problem SolvingMATHAccuracy60ToRA-Code 34B model (w/ code, SC, k=50)
Math Word Problem SolvingMATHParameters (Billions)34ToRA-Code 34B model (w/ code, SC, k=50)
Math Word Problem SolvingMATHAccuracy56.9ToRA 70B (w/ code, SC, k=50)
Math Word Problem SolvingMATHParameters (Billions)70ToRA 70B (w/ code, SC, k=50)
Math Word Problem SolvingMATHAccuracy50.8ToRA-Code 34B (w/ code)
Math Word Problem SolvingMATHParameters (Billions)34ToRA-Code 34B (w/ code)
Math Word Problem SolvingMATHAccuracy49.7ToRA 70B (w/ code)
Math Word Problem SolvingMATHParameters (Billions)70ToRA 70B (w/ code)
Math Word Problem SolvingMATHAccuracy48.1ToRA-Code 13B (w/ code)
Math Word Problem SolvingMATHParameters (Billions)13ToRA-Code 13B (w/ code)
Math Word Problem SolvingMATHAccuracy44.6ToRA-Code 7B (w/ code)
Math Word Problem SolvingMATHParameters (Billions)7ToRA-Code 7B (w/ code)
Math Word Problem SolvingMATHAccuracy43ToRA 13B (w/ code)
Math Word Problem SolvingMATHParameters (Billions)13ToRA 13B (w/ code)
Math Word Problem SolvingMATHAccuracy40.1ToRA 7B (w/ code)
Math Word Problem SolvingMATHParameters (Billions)7ToRA 7B (w/ code)
Mathematical Question AnsweringMATHAccuracy60ToRA-Code 34B model (w/ code, SC, k=50)
Mathematical Question AnsweringMATHParameters (Billions)34ToRA-Code 34B model (w/ code, SC, k=50)
Mathematical Question AnsweringMATHAccuracy56.9ToRA 70B (w/ code, SC, k=50)
Mathematical Question AnsweringMATHParameters (Billions)70ToRA 70B (w/ code, SC, k=50)
Mathematical Question AnsweringMATHAccuracy50.8ToRA-Code 34B (w/ code)
Mathematical Question AnsweringMATHParameters (Billions)34ToRA-Code 34B (w/ code)
Mathematical Question AnsweringMATHAccuracy49.7ToRA 70B (w/ code)
Mathematical Question AnsweringMATHParameters (Billions)70ToRA 70B (w/ code)
Mathematical Question AnsweringMATHAccuracy48.1ToRA-Code 13B (w/ code)
Mathematical Question AnsweringMATHParameters (Billions)13ToRA-Code 13B (w/ code)
Mathematical Question AnsweringMATHAccuracy44.6ToRA-Code 7B (w/ code)
Mathematical Question AnsweringMATHParameters (Billions)7ToRA-Code 7B (w/ code)
Mathematical Question AnsweringMATHAccuracy43ToRA 13B (w/ code)
Mathematical Question AnsweringMATHParameters (Billions)13ToRA 13B (w/ code)
Mathematical Question AnsweringMATHAccuracy40.1ToRA 7B (w/ code)
Mathematical Question AnsweringMATHParameters (Billions)7ToRA 7B (w/ code)
Mathematical ReasoningMATHAccuracy60ToRA-Code 34B model (w/ code, SC, k=50)
Mathematical ReasoningMATHParameters (Billions)34ToRA-Code 34B model (w/ code, SC, k=50)
Mathematical ReasoningMATHAccuracy56.9ToRA 70B (w/ code, SC, k=50)
Mathematical ReasoningMATHParameters (Billions)70ToRA 70B (w/ code, SC, k=50)
Mathematical ReasoningMATHAccuracy50.8ToRA-Code 34B (w/ code)
Mathematical ReasoningMATHParameters (Billions)34ToRA-Code 34B (w/ code)
Mathematical ReasoningMATHAccuracy49.7ToRA 70B (w/ code)
Mathematical ReasoningMATHParameters (Billions)70ToRA 70B (w/ code)
Mathematical ReasoningMATHAccuracy48.1ToRA-Code 13B (w/ code)
Mathematical ReasoningMATHParameters (Billions)13ToRA-Code 13B (w/ code)
Mathematical ReasoningMATHAccuracy44.6ToRA-Code 7B (w/ code)
Mathematical ReasoningMATHParameters (Billions)7ToRA-Code 7B (w/ code)
Mathematical ReasoningMATHAccuracy43ToRA 13B (w/ code)
Mathematical ReasoningMATHParameters (Billions)13ToRA 13B (w/ code)
Mathematical ReasoningMATHAccuracy40.1ToRA 7B (w/ code)
Mathematical ReasoningMATHParameters (Billions)7ToRA 7B (w/ code)
Arithmetic ReasoningGSM8KAccuracy88.3ToRA-70B (SC, k=50)
Arithmetic ReasoningGSM8KParameters (Billion)70ToRA-70B (SC, k=50)
Arithmetic ReasoningGSM8KAccuracy85.1ToRA-Code-34B (SC, k=50)
Arithmetic ReasoningGSM8KParameters (Billion)34ToRA-Code-34B (SC, k=50)
Arithmetic ReasoningGSM8KAccuracy84.3ToRA 70B
Arithmetic ReasoningGSM8KParameters (Billion)70ToRA 70B
Arithmetic ReasoningGSM8KAccuracy80.7ToRA-Code 34B
Arithmetic ReasoningGSM8KParameters (Billion)34ToRA-Code 34B
Arithmetic ReasoningGSM8KAccuracy75.8ToRA-Code 13B
Arithmetic ReasoningGSM8KParameters (Billion)13ToRA-Code 13B
Arithmetic ReasoningGSM8KAccuracy72.6ToRA-Code 7B
Arithmetic ReasoningGSM8KParameters (Billion)7ToRA-Code 7B

Related Papers

VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner2025-07-17Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)2025-07-17A Survey of Deep Learning for Geometry Problem Solving2025-07-16Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?2025-07-15Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding2025-07-15