TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/An In-Context Learning Agent for Formal Theorem-Proving

An In-Context Learning Agent for Formal Theorem-Proving

Amitayush Thakur, George Tsoukalas, Yeming Wen, Jimmy Xin, Swarat Chaudhuri

2023-10-06Automated Theorem ProvingLarge Language ModelLanguage Modelling
PaperPDFCode(official)

Abstract

We present an in-context learning agent for formal theorem-proving in environments like Lean and Coq. Current state-of-the-art models for the problem are finetuned on environment-specific proof data. By contrast, our approach, called COPRA, repeatedly asks a high-capacity, general-purpose large language model (GPT-4) to propose tactic applications from within a stateful backtracking search. Proposed tactics are executed in the underlying proof environment. Feedback from the execution is used to build the prompt for the next model query, along with selected information from the search history and lemmas retrieved from an external database. We evaluate our implementation of COPRA on the miniF2F benchmark for Lean and a set of Coq tasks from the CompCert project. On these benchmarks, COPRA significantly outperforms few-shot invocations of GPT-4. It also compares favorably against finetuning-based approaches, outperforming ReProver, a state-of-the-art finetuned approach for Lean, in terms of the pass@1 metric. Our code and data are available at https://github.com/trishullab/copra.

Results

TaskDatasetMetricValueModel
Automated Theorem ProvingminiF2F-testPass@130.7COPRA + GPT-4-turbo
Automated Theorem ProvingminiF2F-testcumulative30.7COPRA + GPT-4-turbo
Automated Theorem ProvingminiF2F-testPass@123.3COPRA + GPT-4
Automated Theorem ProvingminiF2F-testcumulative23.3COPRA + GPT-4
Automated Theorem ProvingminiF2F-testPass@111.9COPRA + GPT-3.5
Automated Theorem ProvingminiF2F-testcumulative11.9COPRA + GPT-3.5
Mathematical ProofsminiF2F-testPass@130.7COPRA + GPT-4-turbo
Mathematical ProofsminiF2F-testcumulative30.7COPRA + GPT-4-turbo
Mathematical ProofsminiF2F-testPass@123.3COPRA + GPT-4
Mathematical ProofsminiF2F-testcumulative23.3COPRA + GPT-4
Mathematical ProofsminiF2F-testPass@111.9COPRA + GPT-3.5
Mathematical ProofsminiF2F-testcumulative11.9COPRA + GPT-3.5

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits2025-07-18GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17