TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MiniF2F: a cross-system benchmark for formal Olympiad-leve...

MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics

Kunhao Zheng, Jesse Michael Han, Stanislas Polu

2021-08-31ICLR 2022 4Automated Theorem Proving
PaperPDFCodeCodeCode(official)Code

Abstract

We present miniF2F, a dataset of formal Olympiad-level mathematics problems statements intended to provide a unified cross-system benchmark for neural theorem proving. The miniF2F benchmark currently targets Metamath, Lean, Isabelle (partially) and HOL Light (partially) and consists of 488 problem statements drawn from the AIME, AMC, and the International Mathematical Olympiad (IMO), as well as material from high-school and undergraduate mathematics courses. We report baseline results using GPT-f, a neural theorem prover based on GPT-3 and provide an analysis of its performance. We intend for miniF2F to be a community-driven effort and hope that our benchmark will help spur advances in neural theorem proving.

Results

TaskDatasetMetricValueModel
Automated Theorem ProvingminiF2F-validPass@123.9Lean GPT-f
Automated Theorem ProvingminiF2F-validPass@829.3Lean GPT-f
Automated Theorem ProvingminiF2F-validPass@11Metamath GPT-f
Automated Theorem ProvingminiF2F-validPass@82Metamath GPT-f
Automated Theorem ProvingminiF2F-validPass@116.8Lean tidy
Automated Theorem ProvingminiF2F-testPass@124.6Lean GPT-f
Automated Theorem ProvingminiF2F-testPass@3229.2Lean GPT-f
Automated Theorem ProvingminiF2F-testcumulative29.2Lean GPT-f
Automated Theorem ProvingminiF2F-testPass@118Lean tidy
Automated Theorem ProvingminiF2F-testcumulative18Lean tidy
Automated Theorem ProvingminiF2F-testPass@11.3Metamath GPT-f
Automated Theorem ProvingminiF2F-testcumulative1.6Metamath GPT-f
Mathematical ProofsminiF2F-validPass@123.9Lean GPT-f
Mathematical ProofsminiF2F-validPass@829.3Lean GPT-f
Mathematical ProofsminiF2F-validPass@11Metamath GPT-f
Mathematical ProofsminiF2F-validPass@82Metamath GPT-f
Mathematical ProofsminiF2F-validPass@116.8Lean tidy
Mathematical ProofsminiF2F-testPass@124.6Lean GPT-f
Mathematical ProofsminiF2F-testPass@3229.2Lean GPT-f
Mathematical ProofsminiF2F-testcumulative29.2Lean GPT-f
Mathematical ProofsminiF2F-testPass@118Lean tidy
Mathematical ProofsminiF2F-testcumulative18Lean tidy
Mathematical ProofsminiF2F-testPass@11.3Metamath GPT-f
Mathematical ProofsminiF2F-testcumulative1.6Metamath GPT-f

Related Papers

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization2025-07-08Prover Agent: An Agent-based Framework for Formal Mathematical Proofs2025-06-24Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving2025-06-20MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?2025-06-06Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification2025-06-05LeanExplore: A search engine for Lean 4 declarations2025-06-04Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening2025-06-03ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction2025-05-30