| 1 | EG-CFG (DeepSeek-V3-0324) | 96.6 | No | Execution Guided Line-by-Line Code Generation | 2025-06-12 | Code |
| 2 | QualityFlow (Sonnet-3.5) | 94.2 | No | QualityFlow: An Agentic Workflow for Program Syn... | 2025-01-20 | - |
| 3 | o1-mini + MapCoder (Hamming.ai) | 93.2 | Yes | MapCoder: Multi-Agent Code Generation for Compet... | 2024-05-18 | Code |
| 4 | MGDebugger (DeepSeek-V3-0324) | 92.4 | No | From Code to Correctness: Closing the Last Mile ... | 2024-10-02 | Code |
| 5 | GPT-4 + AgentCoder | 91.8 | No | AgentCoder: Multi-Agent-based Code Generation wi... | 2023-12-20 | Code |
| 6 | CodeSim (GPT4o) | 90.7 | No | CODESIM: Multi-Agent Code Generation and Problem... | 2025-02-08 | Code |
| 7 | Jiutian-大模型 | 90 | No | - | - | - |
| 8 | GPT-3.5 Turbo (ChatGPT) + AgentCoder | 89.9 | No | AgentCoder: Multi-Agent-based Code Generation wi... | 2023-12-20 | Code |
| 9 | MapCoder (GPT-4o) | 89.7 | No | MapCoder: Multi-Agent Code Generation for Compet... | 2024-05-18 | Code |
| 10 | GPT-4 (ChatGPT Plus) | 87.5 | No | How Does Naming Affect LLMs on Code Analysis Tas... | 2023-07-24 | - |
| 11 | Claude 3 Opus | 86.4 | No | - | - | - |
| 12 | LPW (GPT-4o) | 84.8 | No | Planning-Driven Programming: A Large Language Mo... | 2024-11-21 | Code |
| 13 | AFlow(GPT-4o-mini) | 83.4 | No | AFlow: Automating Agentic Workflow Generation | 2024-10-14 | Code |
| 14 | GPT-3.5 Turbo (ChatGPT) | 83.2 | No | How Does Naming Affect LLMs on Code Analysis Tas... | 2023-07-24 | - |
| 15 | EG-CFG (DeepSeek Coder 1.3b Instruct) | 83.2 | No | Execution Guided Line-by-Line Code Generation | 2025-06-12 | Code |
| 16 | MapCoder (GPT-4) | 83.1 | No | MapCoder: Multi-Agent Code Generation for Compet... | 2024-05-18 | Code |
| 17 | o1-mini + Language Agent Tree Search (Hamming.ai) | 82.3 | No | Language Agent Tree Search Unifies Reasoning Act... | 2023-10-06 | Code |
| 18 | GPT-4 (Bing Chat) | 82 | No | How Does Naming Affect LLMs on Code Analysis Tas... | 2023-07-24 | - |
| 19 | GPT-3.5 Turbo + Language Agent Tree Search | 81.1 | No | Language Agent Tree Search Unifies Reasoning Act... | 2023-10-06 | Code |
| 20 | MGDebugger (CodeQwen1.5) | 80.8 | No | From Code to Correctness: Closing the Last Mile ... | 2024-10-02 | Code |
| 21 | Claude 3 Haiku | 80.4 | No | - | - | - |
| 22 | GPT-4 (Self-Debugging with unit tests + trace) | 80.2 | No | Teaching Large Language Models to Self-Debug | 2023-04-11 | Code |
| 23 | GPT-4 (few-shot) | 80 | Yes | DeepSeek-Coder: When the Large Language Model Me... | 2024-01-25 | Code |
| 24 | Claude 3 Sonnet | 79.4 | No | - | - | - |
| 25 | Bard (PaLM 2/chat-bison-001) | 76.2 | No | How Does Naming Affect LLMs on Code Analysis Tas... | 2023-07-24 | - |
| 26 | GPT-3.5 Turbo (Self-Debugging with unit tests + trace) | 72.8 | No | Teaching Large Language Models to Self-Debug | 2023-04-11 | Code |
| 27 | Claude | 71.4 | No | How Does Naming Affect LLMs on Code Analysis Tas... | 2023-07-24 | - |
| 28 | code-davinci-002 175B (Self-Debugging with unit tests + trace) | 70.8 | No | Teaching Large Language Models to Self-Debug | 2023-04-11 | Code |
| 29 | GPT-3.5 Turbo (few-shot) | 70.8 | Yes | DeepSeek-Coder: When the Large Language Model Me... | 2024-01-25 | Code |
| 30 | DeepSeek-Coder-Instruct 33B (few-shot) | 70 | No | DeepSeek-Coder: When the Large Language Model Me... | 2024-01-25 | Code |
| 31 | GPT-3.5 Turbo + INTERVENOR | 69.8 | No | INTERVENOR: Prompting the Coding Ability of Larg... | 2023-11-16 | Code |
| 32 | code-davinci-002 175B + LEVER | 68.9 | No | LEVER: Learning to Verify Language-to-Code Gener... | 2023-02-16 | Code |
| 33 | code-davinci-002 175B + CodeT | 67.7 | No | CodeT: Code Generation with Generated Tests | 2022-07-21 | Code |
| 34 | GPT-3.5 Turbo (3-shot) | 67.6 | Yes | Teaching Large Language Models to Self-Debug | 2023-04-11 | Code |
| 35 | code-davinci-002 175B + Reviewer | 66.9 | No | Coder Reviewer Reranking for Code Generation | 2022-11-29 | Code |
| 36 | code-davinci-002 175B + Coder-Reviewer | 66.4 | No | Coder Reviewer Reranking for Code Generation | 2022-11-29 | Code |
| 37 | StarCoder2-15B | 66.2 | No | StarCoder 2 and The Stack v2: The Next Generation | 2024-02-29 | Code |
| 38 | DeepSeek-Coder-Base 33B (few-shot) | 66 | No | DeepSeek-Coder: When the Large Language Model Me... | 2024-01-25 | Code |
| 39 | Code Llama - Python 70B (3-shot) | 65.5 | Yes | Code Llama: Open Foundation Models for Code | 2023-08-24 | Code |
| 40 | DeepSeek-Coder-Instruct 6.7B (few-shot) | 65.4 | No | DeepSeek-Coder: When the Large Language Model Me... | 2024-01-25 | Code |
| 41 | code-davinci-002 175B + MBR-Exec | 63 | No | Coder Reviewer Reranking for Code Generation | 2022-11-29 | Code |
| 42 | Code Llama 70B (3-shot) | 62.4 | No | Code Llama: Open Foundation Models for Code | 2023-08-24 | Code |
| 43 | Code Llama - Instruct 70B (3-shot) | 62.2 | No | Code Llama: Open Foundation Models for Code | 2023-08-24 | Code |
| 44 | code-davinci-001 175B + CodeT | 61.9 | No | CodeT: Code Generation with Generated Tests | 2022-07-21 | Code |
| 45 | code-davinci-002 175B (3-shot) | 61.4 | Yes | Teaching Large Language Models to Self-Debug | 2023-04-11 | Code |
| 46 | Unnatural Code Llama 34B (3-shot) | 61.2 | No | Code Llama: Open Foundation Models for Code | 2023-08-24 | Code |
| 47 | Mixtral 8x7B (3-shot) | 60.7 | No | Mixtral of Experts | 2024-01-08 | Code |
| 48 | DeepSeek-Coder-Base 6.7B (few-shot) | 60.6 | No | DeepSeek-Coder: When the Large Language Model Me... | 2024-01-25 | Code |
| 49 | code-davinci-001 175B + MBR-Exec | 58.2 | No | Natural Language to Code Translation with Execut... | 2022-04-25 | Code |
| 50 | Code Llama - Instruct 34B (3-shot) | 57 | No | Code Llama: Open Foundation Models for Code | 2023-08-24 | Code |
| 51 | Code Llama - Python 34B (3-shot) | 56.2 | Yes | Code Llama: Open Foundation Models for Code | 2023-08-24 | Code |
| 52 | code-cushman-001 12B (CodeT) | 55.4 | No | CodeT: Code Generation with Generated Tests | 2022-07-21 | Code |
| 53 | Code Llama 34B (3-shot) | 55 | Yes | Code Llama: Open Foundation Models for Code | 2023-08-24 | Code |
| 54 | StarCoder 15.5B (Self-Debugging with unit tests + trace) | 53.2 | No | Teaching Large Language Models to Self-Debug | 2023-04-11 | Code |
| 55 | StarCoder 15.5B | 52.7 | No | StarCoder: may the source be with you! | 2023-05-09 | Code |
| 56 | GPT-3.5 Turbo | 52.2 | Yes | Code Llama: Open Foundation Models for Code | 2023-08-24 | Code |
| 57 | WizardCoder 15B | 51.8 | Yes | WizardCoder: Empowering Code Large Language Mode... | 2023-06-14 | Code |
| 58 | PaLM 2-S* (few-shot) | 50 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 59 | CodeGen-Mono 16B + CodeT | 49.5 | No | CodeT: Code Generation with Generated Tests | 2022-07-21 | Code |
| 60 | Code Llama - Instruct 13B (3-shot) | 49.4 | No | Code Llama: Open Foundation Models for Code | 2023-08-24 | Code |
| 61 | DeepSeek-Coder-Instruct 1.3B (few-shot) | 49.4 | No | DeepSeek-Coder: When the Large Language Model Me... | 2024-01-25 | Code |
| 62 | StarCoderBase 15.5B | 49 | No | StarCoder: may the source be with you! | 2023-05-09 | Code |
| 63 | Code Llama - Python 13B (3-shot) | 49 | No | Code Llama: Open Foundation Models for Code | 2023-08-24 | Code |
| 64 | Qwen2idae-16x14B (4-shot) | 48.6 | No | Parameter-Efficient Sparsity Crafting from Dense... | 2024-01-05 | Code |
| 65 | code-cushman-001 12B + MBR-Exec | 48.3 | No | Coder Reviewer Reranking for Code Generation | 2022-11-29 | Code |
| 66 | Code Llama - Python 7B (3-shot) | 47.6 | No | Code Llama: Open Foundation Models for Code | 2023-08-24 | Code |
| 67 | Mistral 7B (3-shot) | 47.5 | No | Mistral 7B | 2023-10-10 | Code |
| 68 | CodeGen 16B + MBR-Exec | 47.3 | No | Coder Reviewer Reranking for Code Generation | 2022-11-29 | Code |
| 69 | StarCoder 15.5B (3-shot) | 47.2 | No | Teaching Large Language Models to Self-Debug | 2023-04-11 | Code |
| 70 | PaLM Coder 540B | 47 | No | PaLM: Scaling Language Modeling with Pathways | 2022-04-05 | Code |
| 71 | Code Llama 13B (3-shot) | 47 | No | Code Llama: Open Foundation Models for Code | 2023-08-24 | Code |
| 72 | CodeGen 16B + Coder-Reviewer | 46.2 | No | Coder Reviewer Reranking for Code Generation | 2022-11-29 | Code |
| 73 | DeepSeek-Coder-Base 1.3B (few-shot) | 46.2 | No | DeepSeek-Coder: When the Large Language Model Me... | 2024-01-25 | Code |
| 74 | GPT-3.5 Turbo (few-shot) | 45.4 | No | INTERVENOR: Prompting the Coding Ability of Larg... | 2023-11-16 | Code |
| 75 | Llama 2 70B (zero-shot) | 45 | No | Llama 2: Open Foundation and Fine-Tuned Chat Mod... | 2023-07-18 | Code |
| 76 | Code Llama - Instruct 7B (3-shot) | 44.4 | No | Code Llama: Open Foundation Models for Code | 2023-08-24 | Code |
| 77 | CodeGen 16B + Reviewer | 44.1 | No | Coder Reviewer Reranking for Code Generation | 2022-11-29 | Code |
| 78 | phi-1.5-web 1.3B | 43.5 | No | Textbooks Are All You Need II: phi-1.5 technical... | 2023-09-11 | Code |
| 79 | Branch-Train-Merge 4x7B (top-2) | 42.6 | No | Branch-Train-MiX: Mixing Expert LLMs into a Mixt... | 2024-03-12 | Code |
| 80 | Code Llama 7B (3-shot) | 41.4 | No | Code Llama: Open Foundation Models for Code | 2023-08-24 | Code |
| 81 | Camelidae-8×34B (4-shot) | 41.4 | No | Parameter-Efficient Sparsity Crafting from Dense... | 2024-01-05 | Code |
| 82 | GPT-3.5 Turbo (0-shot) | 39.8 | No | INTERVENOR: Prompting the Coding Ability of Larg... | 2023-11-16 | Code |
| 83 | Branch-Train-MiX 4x7B (sampling top-2 experts) | 39.4 | No | Branch-Train-MiX: Mixing Expert LLMs into a Mixt... | 2024-03-12 | Code |
| 84 | LLaMA 65B (0-shot) | 37.7 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 85 | PaLM 540B | 36.8 | No | PaLM: Scaling Language Modeling with Pathways | 2022-04-05 | Code |
| 86 | SantaCoder 1.1B | 35 | No | StarCoder: may the source be with you! | 2023-05-09 | Code |
| 87 | InCoder 6.7B + CodeT | 34.4 | No | CodeT: Code Generation with Generated Tests | 2022-07-21 | Code |
| 88 | Llama 2 34B (0-shot) | 33 | No | Llama 2: Open Foundation and Fine-Tuned Chat Mod... | 2023-07-18 | Code |
| 89 | Llama 2 13B (0-shot) | 30.6 | No | Llama 2: Open Foundation and Fine-Tuned Chat Mod... | 2023-07-18 | Code |
| 90 | LLaMA 33B (0-shot) | 30.2 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 91 | InCoder 6.7B + MBR-Exec | 26.7 | No | Coder Reviewer Reranking for Code Generation | 2022-11-29 | Code |
| 92 | InCoder 6.7B + Coder-Reviewer | 26.1 | No | Coder Reviewer Reranking for Code Generation | 2022-11-29 | Code |
| 93 | InCoder 6.7B + Reviewer | 24.4 | No | Coder Reviewer Reranking for Code Generation | 2022-11-29 | Code |
| 94 | CodeGeeX-13B | 24.4 | No | CodeGeeX: A Pre-Trained Model for Code Generatio... | 2023-03-30 | Code |
| 95 | LLaMA 13B (0-shot) | 22 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 96 | Llama 2 7B (0-shot) | 20.8 | No | Llama 2: Open Foundation and Fine-Tuned Chat Mod... | 2023-07-18 | Code |
| 97 | InCoder 6.7B (0-shot) | 19.4 | No | InCoder: A Generative Model for Code Infilling a... | 2022-04-12 | Code |
| 98 | LLaMA 7B (0-shot) | 17.7 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |