TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Lyra: Orchestrating Dual Correction in Automated Theorem P...

Lyra: Orchestrating Dual Correction in Automated Theorem Proving

Chuanyang Zheng, Haiming Wang, Enze Xie, Zhengying Liu, Jiankai Sun, Huajian Xin, Jianhao Shen, Zhenguo Li, Yu Li

2023-09-27Automated Theorem ProvingHallucination
PaperPDFCode(official)

Abstract

Large Language Models (LLMs) present an intriguing avenue for exploration in the field of formal theorem proving. Nevertheless, their full potential, particularly concerning the mitigation of hallucinations and refinement through prover error messages, remains an area that has yet to be thoroughly investigated. To enhance the effectiveness of LLMs in the field, we introduce the Lyra, a new framework that employs two distinct correction mechanisms: Tool Correction (TC) and Conjecture Correction (CC). To implement Tool Correction in the post-processing of formal proofs, we leverage prior knowledge to utilize predefined prover tools (e.g., Sledgehammer) for guiding the replacement of incorrect tools. Tool Correction significantly contributes to mitigating hallucinations, thereby improving the overall accuracy of the proof. In addition, we introduce Conjecture Correction, an error feedback mechanism designed to interact with prover to refine formal proof conjectures with prover error messages. Compared to the previous refinement framework, the proposed Conjecture Correction refines generation with instruction but does not collect paired (generation, error & refinement) prompts. Our method has achieved state-of-the-art (SOTA) performance on both miniF2F validation (48.0% -> 55.3%) and test (45.5% -> 51.2%). We also present 3 IMO problems solved by Lyra. We believe Tool Correction (post-process for hallucination mitigation) and Conjecture Correction (subgoal adjustment from interaction with environment) could provide a promising avenue for future research in this field.

Results

TaskDatasetMetricValueModel
Automated Theorem ProvingminiF2F-validPass@10052Lyra + GPT-4
Automated Theorem ProvingminiF2F-testPass@10047.1Lyra + GPT-4
Automated Theorem ProvingminiF2F-testcumulative47.1Lyra + GPT-4
Mathematical ProofsminiF2F-validPass@10052Lyra + GPT-4
Mathematical ProofsminiF2F-testPass@10047.1Lyra + GPT-4
Mathematical ProofsminiF2F-testcumulative47.1Lyra + GPT-4

Related Papers

Mitigating Object Hallucinations via Sentence-Level Early Intervention2025-07-16ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way2025-07-11CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization2025-07-08UQLM: A Python Package for Uncertainty Quantification in Large Language Models2025-07-08DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning2025-07-07ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding2025-07-07The Future is Agentic: Definitions, Perspectives, and Open Challenges of Multi-Agent Recommender Systems2025-07-02GAF-Guard: An Agentic Framework for Risk Management and Governance in Large Language Models2025-07-01