ChatDev

Reported on 3 benchmarks across 1 task

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing3 results

Code GenerationonDSEval-LeetCode
Pass Rate
32.5
best: 57.5 (Jupyter-AI)
Code GenerationonDSEval-LeetCode
w/o Intact
32.5
best: 57.5 (Jupyter-AI)
Code GenerationonDSEval-LeetCode
w/o PE
50
best: 70 (Jupyter-AI)