TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Cumulative Reasoning with Large Language Models

Cumulative Reasoning with Large Language Models

Yifan Zhang, Jingqin Yang, Yang Yuan, Andrew Chi-Chih Yao

2023-08-08MathMath Word Problem SolvingLogical ReasoningDecision Making
PaperPDFCode(official)

Abstract

Despite the recent advancements in language models (LMs), their ability to solve complex problems remains limited. This paper introduces Cumulative Reasoning (CR), a novel approach that utilizes LMs cumulatively and iteratively, mirroring human thought processes for problem-solving. CR decomposes tasks into smaller, manageable components and leverages previous propositions for effective composition, significantly enhancing problem-solving capabilities. We demonstrate CR's superiority through several complex reasoning tasks: it outperforms existing methods in logical inference tasks with up to a 9.3% improvement, achieving 98.04% accuracy on the curated FOLIO wiki dataset. In the Game of 24, it achieves 98% accuracy, marking a 24% improvement over the prior state-of-the-art. Additionally, CR sets new state-of-the-art on the MATH dataset, achieving a 4.2% increase from previous methods and a 43% relative improvement in the most challenging problems. By extending CR to incorporate a code environment without external aids like retrieval or web browsing, we further harness the computational and logical reasoning capabilities of LMs, achieving a remarkable 72.2% accuracy on the MATH dataset and outperforming the PAL/PoT method by 38.8%. Our work not only sets new state-of-the-art but also paves the way toward more sophisticated AI reasoning methods. The code is available at https://github.com/iiis-ai/cumulative-reasoning.

Results

TaskDatasetMetricValueModel
Question AnsweringMATHAccuracy72.2CR (GPT-4-turbo model, w/ code)
Question AnsweringMATHAccuracy58CR (GPT-4 model, w/o code)
Math Word Problem SolvingMATHAccuracy72.2CR (GPT-4-turbo model, w/ code)
Math Word Problem SolvingMATHAccuracy58CR (GPT-4 model, w/o code)
Mathematical Question AnsweringMATHAccuracy72.2CR (GPT-4-turbo model, w/ code)
Mathematical Question AnsweringMATHAccuracy58CR (GPT-4 model, w/o code)
Mathematical ReasoningMATHAccuracy72.2CR (GPT-4-turbo model, w/ code)
Mathematical ReasoningMATHAccuracy58CR (GPT-4 model, w/o code)

Related Papers

Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion2025-07-18VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17Higher-Order Pattern Unification Modulo Similarity Relations2025-07-17Exploiting Constraint Reasoning to Build Graphical Explanations for Mixed-Integer Linear Programming2025-07-17Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding2025-07-15Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing2025-07-15