TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MapCoder: Multi-Agent Code Generation for Competitive Prob...

MapCoder: Multi-Agent Code Generation for Competitive Problem Solving

Md. Ashraful Islam, Mohammed Eunus Ali, Md Rizwan Parvez

2024-05-18Program SynthesisCode GenerationHumanEval
PaperPDFCodeCode(official)

Abstract

Code synthesis, which requires a deep understanding of complex natural language problem descriptions, generation of code instructions for complex algorithms and data structures, and the successful execution of comprehensive unit tests, presents a significant challenge. While large language models (LLMs) demonstrate impressive proficiency in natural language processing, their performance in code generation tasks remains limited. In this paper, we introduce a new approach to code generation tasks leveraging multi-agent prompting that uniquely replicates the full cycle of program synthesis as observed in human developers. Our framework, MapCoder, consists of four LLM agents specifically designed to emulate the stages of this cycle: recalling relevant examples, planning, code generation, and debugging. After conducting thorough experiments, with multiple LLM ablations and analyses across eight challenging competitive problem-solving and program synthesis benchmarks, MapCoder showcases remarkable code generation capabilities, achieving new state-of-the-art results (pass@1) on HumanEval (93.9%), MBPP (83.1%), APPS (22.0%), CodeContests (28.5%), and xCodeEval (45.3%). Moreover, our method consistently delivers superior performance across various programming languages and varying problem difficulties. We open-source our framework at https://github.com/Md-Ashraful-Pramanik/MapCoder.

Results

TaskDatasetMetricValueModel
Code GenerationCodeContestsTest Set pass@128.5MapCoder (GPT-4)
Code GenerationCodeContestsTest Set pass@535.2MapCoder (GPT-4)
Code GenerationCodeContestsVal Set pass@128.5MapCoder (GPT-4)
Code GenerationMBPPAccuracy93.2o1-mini + MapCoder (Hamming.ai)
Code GenerationMBPPAccuracy89.7MapCoder (GPT-4o)
Code GenerationMBPPAccuracy83.1MapCoder (GPT-4)
Code GenerationHumanEvalPass@193.9Mistral 7B

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18Towards Formal Verification of LLM-Generated Code from Natural Language Prompts2025-07-17MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks2025-07-16Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs2025-07-15Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding2025-07-14CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks2025-07-14CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance2025-07-14