CodeSim (GPT4)

Reported on 4 benchmarks across 1 task · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing4 results

Code GenerationonAPPS
Competition Pass@1· 2025-02-08
0.81
best: 34.8 (LPW (GPT-4o))
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging arXiv:2502.05664
Code GenerationonAPPS
Interview Pass@1· 2025-02-08
4.21
best: 65.2 (LPW (GPT-4o))
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging arXiv:2502.05664
Code GenerationonAPPS
Introductory Pass@1· 2025-02-08
26.04
best: 87.2 (LPW (GPT-4o))
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging arXiv:2502.05664
Code GenerationonCodeContests
Test Set pass@1· 2025-02-08
28.4
best: 58.18 (EG-CFG (DeepSeek-V3-0324))
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging arXiv:2502.05664