Metric: Prompt-level loose-accuracy (higher is better)
| # | Model↕ | Prompt-level loose-accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | AutoIF (Llama3 70B) | 85.6 | No | Self-play with Execution Feedback: Improving Ins... | 2024-06-19 | Code |
| 2 | AutoIF (Qwen2 72B) | 82.3 | No | Self-play with Execution Feedback: Improving Ins... | 2024-06-19 | Code |
| 3 | GPT-4 | 79.3 | No | Instruction-Following Evaluation for Large Langu... | 2023-11-14 | Code |
| 4 | PaLM 2 S | 46.95 | No | Instruction-Following Evaluation for Large Langu... | 2023-11-14 | Code |