TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/LPW (GPT-4o)

LPW (GPT-4o)

Reported on 9 benchmarks across 1 task · 1 paper · 8 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing9 results

  • Code GenerationonHumanEval-ET
    Pass@1· 2024-11-21
    65.8
    best: 87.19 (EG-CFG (DeepSeek-V3-0324))
    SOTA
    Planning-Driven Programming: A Large Language Model Programming WorkflowarXiv:2411.14503
  • Code GenerationonAPPS
    Competition Pass@1· 2024-11-21
    34.8
    SOTA
    Planning-Driven Programming: A Large Language Model Programming WorkflowarXiv:2411.14503
  • Code GenerationonAPPS
    Interview Pass@1· 2024-11-21
    65.2
    SOTA
    Planning-Driven Programming: A Large Language Model Programming WorkflowarXiv:2411.14503
  • Code GenerationonAPPS
    Introductory Pass@1· 2024-11-21
    87.2
    SOTA
    Planning-Driven Programming: A Large Language Model Programming WorkflowarXiv:2411.14503
  • Code GenerationonAPPS
    Pass@1· 2024-11-21
    62.6
    SOTA
    Planning-Driven Programming: A Large Language Model Programming WorkflowarXiv:2411.14503
  • Code GenerationonCodeContests
    Test Set pass@1· 2024-11-21
    34.7
    best: 58.18 (EG-CFG (DeepSeek-V3-0324))
    SOTA
    Planning-Driven Programming: A Large Language Model Programming WorkflowarXiv:2411.14503
  • Code GenerationonLivecodebench
    Acc· 2024-11-21
    59.3
    best: 91.6 (Xolver)
    SOTA
    Planning-Driven Programming: A Large Language Model Programming WorkflowarXiv:2411.14503
  • Code GenerationonMBPP-ET
    Pass@1· 2024-11-21
    65.8
    best: 73 (EG-CFG (DeepSeek-V3-0324))
    SOTA
    Planning-Driven Programming: A Large Language Model Programming WorkflowarXiv:2411.14503
  • Code GenerationonMBPP
    Accuracy· 2024-11-21
    84.8
    best: 96.6 (EG-CFG (DeepSeek-V3-0324))
    Planning-Driven Programming: A Large Language Model Programming WorkflowarXiv:2411.14503