AutoIF (Llama3 70B)

Reported on 4 benchmarks across 1 task · 1 paper · 4 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing4 results

Instruction FollowingonIFEval
Inst-level loose-accuracy· 2024-06-19
90.4
SOTA
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models arXiv:2406.13542
Instruction FollowingonIFEval
Inst-level strict-accuracy· 2024-06-19
86.7
SOTA
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models arXiv:2406.13542
Instruction FollowingonIFEval
Prompt-level loose-accuracy· 2024-06-19
85.6
SOTA
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models arXiv:2406.13542
Instruction FollowingonIFEval
Prompt-level strict-accuracy· 2024-06-19
80.2
SOTA
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models arXiv:2406.13542