GPT-3.5 Turbo + INTERVENOR

Reported on 1 benchmark across 1 task · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing1 result

Code GenerationonMBPP
Accuracy· 2023-11-16
69.8
best: 96.6 (EG-CFG (DeepSeek-V3-0324))
INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair arXiv:2311.09868