Metric: pass@1 (higher is better)
| # | Model↕ | pass@1▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | QurrentOS-coder + Claude 3.5 Sonnet | 58 | No | RES-Q: Evaluating Code-Editing Large Language Mo... | 2024-06-24 | Code |
| 2 | QurrentOS-coder + GPT-4o | 46 | No | RES-Q: Evaluating Code-Editing Large Language Mo... | 2024-06-24 | Code |
| 3 | QurrentOS-coder + GPT-4 Turbo | 37 | No | RES-Q: Evaluating Code-Editing Large Language Mo... | 2024-06-24 | Code |
| 4 | QurrentOS-coder + Claude 3 Opus | 36 | No | RES-Q: Evaluating Code-Editing Large Language Mo... | 2024-06-24 | Code |
| 5 | QurrentOS-coder + GPT-4 | 30 | No | RES-Q: Evaluating Code-Editing Large Language Mo... | 2024-06-24 | Code |
| 6 | QurrentOS-coder + Gemini 1.5 Pro | 30 | No | RES-Q: Evaluating Code-Editing Large Language Mo... | 2024-06-24 | Code |
| 7 | QurrentOS-coder + DeepSeek-Coder-V2 | 29 | No | RES-Q: Evaluating Code-Editing Large Language Mo... | 2024-06-24 | Code |
| 8 | QurrentOS-coder + Llama 3 70b | 20 | No | RES-Q: Evaluating Code-Editing Large Language Mo... | 2024-06-24 | Code |
| 9 | QurrentOS-coder + Qwen-72B-Instruct | 18 | No | RES-Q: Evaluating Code-Editing Large Language Mo... | 2024-06-24 | Code |