TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Toward Self-Improvement of LLMs via Imagination, Searching...

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Ye Tian, Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Haitao Mi, Dong Yu

2024-04-18Mathematical ReasoningMathMath Word Problem SolvingGSM8KArithmetic ReasoningSelf-Learning
PaperPDFCode(official)

Abstract

Despite the impressive capabilities of Large Language Models (LLMs) on various tasks, they still struggle with scenarios that involves complex reasoning and planning. Recent work proposed advanced prompting techniques and the necessity of fine-tuning with high-quality data to augment LLMs' reasoning abilities. However, these approaches are inherently constrained by data availability and quality. In light of this, self-correction and self-learning emerge as viable solutions, employing strategies that allow LLMs to refine their outputs and learn from self-assessed rewards. Yet, the efficacy of LLMs in self-refining its response, particularly in complex reasoning and planning task, remains dubious. In this paper, we introduce AlphaLLM for the self-improvements of LLMs, which integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop, thereby enhancing the capabilities of LLMs without additional annotations. Drawing inspiration from the success of AlphaGo, AlphaLLM addresses the unique challenges of combining MCTS with LLM for self-improvement, including data scarcity, the vastness search spaces of language tasks, and the subjective nature of feedback in language tasks. AlphaLLM is comprised of prompt synthesis component, an efficient MCTS approach tailored for language tasks, and a trio of critic models for precise feedback. Our experimental results in mathematical reasoning tasks demonstrate that AlphaLLM significantly enhances the performance of LLMs without additional annotations, showing the potential for self-improvement in LLMs.

Results

TaskDatasetMetricValueModel
Question AnsweringMATHAccuracy51AlphaLLM (with MCTS)
Math Word Problem SolvingMATHAccuracy51AlphaLLM (with MCTS)
Mathematical Question AnsweringMATHAccuracy51AlphaLLM (with MCTS)
Mathematical ReasoningMATHAccuracy51AlphaLLM (with MCTS)
Arithmetic ReasoningGSM8KAccuracy92AlphaLLM (with MCTS)
Arithmetic ReasoningGSM8KParameters (Billion)70AlphaLLM (with MCTS)
GSM8KGSM8KAccuracy92AlphaLLM (with MCTS)

Related Papers

VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17GEMMAS: Graph-based Evaluation Metrics for Multi Agent Systems2025-07-17A Survey of Deep Learning for Geometry Problem Solving2025-07-16Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression2025-07-16KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?2025-07-15Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding2025-07-15