TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Draft, Sketch, and Prove: Guiding Formal Theorem Provers w...

Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs

Albert Q. Jiang, Sean Welleck, Jin Peng Zhou, Wenda Li, Jiacheng Liu, Mateja Jamnik, Timothée Lacroix, Yuhuai Wu, Guillaume Lample

2022-10-21Automated Theorem ProvingMathematical ProofsLanguage Modelling
PaperPDFCodeCode(official)Code(official)

Abstract

The formalization of existing mathematical proofs is a notoriously difficult process. Despite decades of research on automation and proof assistants, writing formal proofs remains arduous and only accessible to a few experts. While previous studies to automate formalization focused on powerful search algorithms, no attempts were made to take advantage of available informal proofs. In this work, we introduce Draft, Sketch, and Prove (DSP), a method that maps informal proofs to formal proof sketches, and uses the sketches to guide an automated prover by directing its search to easier sub-problems. We investigate two relevant setups where informal proofs are either written by humans or generated by a language model. Our experiments and ablation studies show that large language models are able to produce well-structured formal sketches that follow the same reasoning steps as the informal proofs. Guiding an automated prover with these sketches enhances its performance from 20.9% to 39.3% on a collection of mathematical competition problems.

Results

TaskDatasetMetricValueModel
Automated Theorem ProvingminiF2F-validPass@10043.9DSP (62B Minerva informal)
Automated Theorem ProvingminiF2F-testPass@10038.9DSP (540B Minerva informal)
Automated Theorem ProvingminiF2F-testcumulative38.9DSP (540B Minerva informal)
Automated Theorem ProvingminiF2F-testPass@120.9Sledgehammer + heuristics
Automated Theorem ProvingminiF2F-testcumulative20.9Sledgehammer + heuristics
Mathematical ProofsminiF2F-validPass@10043.9DSP (62B Minerva informal)
Mathematical ProofsminiF2F-testPass@10038.9DSP (540B Minerva informal)
Mathematical ProofsminiF2F-testcumulative38.9DSP (540B Minerva informal)
Mathematical ProofsminiF2F-testPass@120.9Sledgehammer + heuristics
Mathematical ProofsminiF2F-testcumulative20.9Sledgehammer + heuristics

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16