Mathematical Proofs on miniF2F-test

Metric: cumulative (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	cumulative▼	Extra Data	Paper	Date↕	Code
1	Kimina-Prover-Preview	80.74	Yes	Kimina-Prover Preview: Towards Large Formal Reas...	2025-04-15	Code
2	ProofAug	66	No	Efficient Neural Theorem Proving via Fine-graine...	2025-01-30	Code
3	DeepSeek-Prover-V1.5	63.5	Yes	DeepSeek-Prover-V1.5: Harnessing Proof Assistant...	2024-08-15	Code
4	Subgoal-XL	56.1	Yes	SubgoalXL: Subgoal-based Expert Learning for The...	2024-08-20	Code
5	DeepSeek-Prover	52	Yes	DeepSeek-Prover: Advancing Theorem Proving in LL...	2024-05-23	-
6	Lyra + GPT-4	47.1	No	Lyra: Orchestrating Dual Correction in Automated...	2023-09-27	Code
7	LEGO-Prover ChatGPT	47.1	No	LEGO-Prover: Neural Theorem Proving with Growing...	2023-10-01	Code
8	Decomposing the Enigma	45.5	No	Decomposing the Enigma: Subgoal-based Demonstrat...	2023-05-25	Code
9	Evariste	41	Yes	HyperTree Proof Search for Neural Theorem Proving	2022-05-23	-
10	Evariste-7d	40.6	No	HyperTree Proof Search for Neural Theorem Proving	2022-05-23	-
11	Evariste-1d	38.9	No	HyperTree Proof Search for Neural Theorem Proving	2022-05-23	-
12	DSP (540B Minerva informal)	38.9	No	Draft, Sketch, and Prove: Guiding Formal Theorem...	2022-10-21	Code
13	Lean Expert Iteration	36.6	Yes	Formal Mathematics Statement Curriculum Learning	2022-02-03	Code
14	GPT-f	36.6	No	HyperTree Proof Search for Neural Theorem Proving	2022-05-23	-
15	Thor + expert iteration on autoformalised theorems	35.2	Yes	-	-	-
16	COPRA + GPT-4-turbo	30.7	No	An In-Context Learning Agent for Formal Theorem-...	2023-10-06	Code
17	Thor	29.9	No	Thor: Wielding Hammers to Integrate Language Mod...	2022-05-22	-
18	Lean GPT-f	29.2	No	MiniF2F: a cross-system benchmark for formal Oly...	2021-08-31	Code
19	MMOS-DeepSeekMath-7B	28.3	No	An Empirical Study of Data Ability Boundary in L...	2024-02-23	Code
20	ReProver	26.5	No	-	-	-
21	LLEMMA-7b	26.2	No	Llemma: An Open Language Model For Mathematics	2023-10-16	Code
22	LLEMMA-34b	25.8	No	Llemma: An Open Language Model For Mathematics	2023-10-16	Code
23	PACT (reproduced by Thor)	24.6	No	Proof Artifact Co-training for Theorem Proving w...	2021-02-11	Code
24	COPRA + GPT-4	23.3	No	An In-Context Learning Agent for Formal Theorem-...	2023-10-06	Code
25	Sledgehammer + heuristics	20.9	No	Draft, Sketch, and Prove: Guiding Formal Theorem...	2022-10-21	Code
26	Lean tidy	18	No	MiniF2F: a cross-system benchmark for formal Oly...	2021-08-31	Code
27	COPRA + GPT-3.5	11.9	No	An In-Context Learning Agent for Formal Theorem-...	2023-10-06	Code
28	Sledgehammer	10.4	No	Thor: Wielding Hammers to Integrate Language Mod...	2022-05-22	-
29	Metamath GPT-f	1.6	No	MiniF2F: a cross-system benchmark for formal Oly...	2021-08-31	Code

#1Kimina-Prover-PreviewSOTA
80.74
cumulative· Extra Data· 2025-04-15
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning Code
#2ProofAugSOTA
66
cumulative· 2025-01-30
Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis Code
#3DeepSeek-Prover-V1.5SOTA
63.5
cumulative· Extra Data· 2024-08-15
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search Code
#4Subgoal-XL
56.1
cumulative· Extra Data· 2024-08-20
SubgoalXL: Subgoal-based Expert Learning for Theorem Proving Code
#5DeepSeek-ProverSOTA
52
cumulative· Extra Data· 2024-05-23
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
#6Lyra + GPT-4SOTA
47.1
cumulative· 2023-09-27
Lyra: Orchestrating Dual Correction in Automated Theorem Proving Code
#7LEGO-Prover ChatGPT
47.1
cumulative· 2023-10-01
LEGO-Prover: Neural Theorem Proving with Growing Libraries Code
#8Decomposing the EnigmaSOTA
45.5
cumulative· 2023-05-25
Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving Code
#9EvaristeSOTA
41
cumulative· Extra Data· 2022-05-23
HyperTree Proof Search for Neural Theorem Proving
#10Evariste-7d
40.6
cumulative· 2022-05-23
HyperTree Proof Search for Neural Theorem Proving
#11Evariste-1d
38.9
cumulative· 2022-05-23
HyperTree Proof Search for Neural Theorem Proving
#12DSP (540B Minerva informal)
38.9
cumulative· 2022-10-21
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs Code
#13Lean Expert IterationSOTA
36.6
cumulative· Extra Data· 2022-02-03
Formal Mathematics Statement Curriculum Learning Code
#14GPT-f
36.6
cumulative· 2022-05-23
HyperTree Proof Search for Neural Theorem Proving
#15Thor + expert iteration on autoformalised theorems
35.2
cumulative· Extra Data
No paper
#16COPRA + GPT-4-turbo
30.7
cumulative· 2023-10-06
An In-Context Learning Agent for Formal Theorem-Proving Code
#17Thor
29.9
cumulative· 2022-05-22
Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers
#18Lean GPT-fSOTA
29.2
cumulative· 2021-08-31
MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics Code
#19MMOS-DeepSeekMath-7B
28.3
cumulative· 2024-02-23
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning Code
#20ReProver
26.5
cumulative
No paper
#21LLEMMA-7b
26.2
cumulative· 2023-10-16
Llemma: An Open Language Model For Mathematics Code
#22LLEMMA-34b
25.8
cumulative· 2023-10-16
Llemma: An Open Language Model For Mathematics Code
#23PACT (reproduced by Thor)SOTA
24.6
cumulative· 2021-02-11
Proof Artifact Co-training for Theorem Proving with Language Models Code
#24COPRA + GPT-4
23.3
cumulative· 2023-10-06
An In-Context Learning Agent for Formal Theorem-Proving Code
#25Sledgehammer + heuristics
20.9
cumulative· 2022-10-21
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs Code
#26Lean tidy
18
cumulative· 2021-08-31
MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics Code
#27COPRA + GPT-3.5
11.9
cumulative· 2023-10-06
An In-Context Learning Agent for Formal Theorem-Proving Code
#28Sledgehammer
10.4
cumulative· 2022-05-22
Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers
#29Metamath GPT-f
1.6
cumulative· 2021-08-31
MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics Code