TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Formal Mathematics Statement Curriculum Learning

Formal Mathematics Statement Curriculum Learning

Stanislas Polu, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor Babuschkin, Ilya Sutskever

2022-02-03Automated Theorem ProvingLanguage Modelling
PaperPDFCode(official)

Abstract

We explore the use of expert iteration in the context of language modeling applied to formal mathematics. We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only. We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems, without the need for associated ground-truth proofs. Finally, by applying this expert iteration to a manually curated set of problem statements, we achieve state-of-the-art on the miniF2F benchmark, automatically solving multiple challenging problems drawn from high school olympiads.

Results

TaskDatasetMetricValueModel
Automated Theorem ProvingminiF2F-testPass@129.6Lean Expert Iteration
Automated Theorem ProvingminiF2F-testPass@3234.5Lean Expert Iteration
Automated Theorem ProvingminiF2F-testPass@6436.6Lean Expert Iteration
Automated Theorem ProvingminiF2F-testcumulative36.6Lean Expert Iteration
Mathematical ProofsminiF2F-testPass@129.6Lean Expert Iteration
Mathematical ProofsminiF2F-testPass@3234.5Lean Expert Iteration
Mathematical ProofsminiF2F-testPass@6436.6Lean Expert Iteration
Mathematical ProofsminiF2F-testcumulative36.6Lean Expert Iteration

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16