ChatDev
Reported on 3 benchmarks across 1 task
Note: results are matched by exact model name. Different papers may use the same name for different model variants.
Natural Language Processing3 results
- Pass Rate32.5best: 57.5 (Jupyter-AI)
- w/o Intact32.5best: 57.5 (Jupyter-AI)
- w/o PE50best: 70 (Jupyter-AI)