Metric: Inst-level strict-accuracy (higher is better)
| # | Model↕ | Inst-level strict-accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | AutoIF (Llama3 70B) | 86.7 | No | Self-play with Execution Feedback: Improving Ins... | 2024-06-19 | Code |
| 2 | AutoIF (Qwen2 72B) | 86.1 | No | Self-play with Execution Feedback: Improving Ins... | 2024-06-19 | Code |
| 3 | GPT-4 | 83.57 | No | Instruction-Following Evaluation for Large Langu... | 2023-11-14 | Code |
| 4 | PaLM 2 S | 55.76 | No | Instruction-Following Evaluation for Large Langu... | 2023-11-14 | Code |