TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Code Generation/HumanEval

Code Generation on HumanEval

Metric: Pass@1 (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Pass@1▼Extra DataPaperDate↕Code
1DeepSeek-R1 (MGDebugger)100NoFrom Code to Correctness: Closing the Last Mile ...2024-10-02Code
2LLaMA 399.4NoDebug like a Human: A Large Language Model Debug...2024-02-25Code
3QualityFlow (Sonnet-3.5)98.8NoQualityFlow: An Agentic Workflow for Program Syn...2025-01-20-
4Phi-298.2NoPlanning-Driven Programming: A Large Language Mo...2024-11-21Code
5EG-CFG (DeepSeek-V3-0324)96.95NoExecution Guided Line-by-Line Code Generation2025-06-12Code
6Mistral 7B93.9NoMapCoder: Multi-Agent Code Generation for Compet...2024-05-18Code
7Claude Sonnet 3.590.85No---
8L2MAC (GPT-4)90.2NoL2MAC: Large Language Model Automatic Computer f...2023-10-02Code