TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PECC: Problem Extraction and Coding Challenges

PECC: Problem Extraction and Coding Challenges

Patrick Haller, Jonas Golde, Alan Akbik

2024-04-29MathText GenerationCode Generation
PaperPDFCode(official)

Abstract

Recent advancements in large language models (LLMs) have showcased their exceptional abilities across various tasks, such as code generation, problem-solving and reasoning. Existing benchmarks evaluate tasks in isolation, yet the extent to which LLMs can understand prose-style tasks, identify the underlying problems, and then generate appropriate code solutions is still unexplored. Addressing this gap, we introduce PECC, a novel benchmark derived from Advent Of Code (AoC) challenges and Project Euler, including 2396 problems. Unlike conventional benchmarks, PECC requires LLMs to interpret narrative-embedded problems, extract requirements, and generate executable code. A key feature of our dataset is the complexity added by natural language prompting in chat-based evaluations, mirroring real-world instruction ambiguities. Results show varying model performance between narrative and neutral problems, with specific challenges in the Euler math-based subset with GPT-3.5-Turbo passing 50% of the AoC challenges and only 8% on the Euler problems. By probing the limits of LLMs' capabilities, our benchmark provides a framework to monitor and assess the subsequent progress of LLMs as a universal problem solver.

Results

TaskDatasetMetricValueModel
Code GenerationPECCPass@327.67Claude 3 Haiku
Code GenerationPECCPass@323.75GPT-3.5 Turbo
Code GenerationPECCPass@311.39codechat-bison
Code GenerationPECCPass@38.48chat-bison
Code GenerationPECCPass@38.35Mixtral-8x7B-Instruct
Code GenerationPECCPass@37.18Phi-3-mini-128k-instruct
Code GenerationPECCPass@33.72WizardLM-2-7B
Code GenerationPECCPass@33.1Llama-3-8B-Instruct

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17Towards Formal Verification of LLM-Generated Code from Natural Language Prompts2025-07-17Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16Mitigating Object Hallucinations via Sentence-Level Early Intervention2025-07-16MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks2025-07-16