Metric: Pass@1 (higher is better)
| # | Model↕ | Pass@1▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | DeepSeek-R1 (MGDebugger) | 100 | No | From Code to Correctness: Closing the Last Mile ... | 2024-10-02 | Code |
| 2 | LLaMA 3 | 99.4 | No | Debug like a Human: A Large Language Model Debug... | 2024-02-25 | Code |
| 3 | QualityFlow (Sonnet-3.5) | 98.8 | No | QualityFlow: An Agentic Workflow for Program Syn... | 2025-01-20 | - |
| 4 | Phi-2 | 98.2 | No | Planning-Driven Programming: A Large Language Mo... | 2024-11-21 | Code |
| 5 | EG-CFG (DeepSeek-V3-0324) | 96.95 | No | Execution Guided Line-by-Line Code Generation | 2025-06-12 | Code |
| 6 | Mistral 7B | 93.9 | No | MapCoder: Multi-Agent Code Generation for Compet... | 2024-05-18 | Code |
| 7 | Claude Sonnet 3.5 | 90.85 | No | - | - | - |
| 8 | L2MAC (GPT-4) | 90.2 | No | L2MAC: Large Language Model Automatic Computer f... | 2023-10-02 | Code |