PaLM 2 S

Reported on 4 benchmarks across 1 task · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing4 results

Instruction FollowingonIFEval
Inst-level loose-accuracy· 2023-11-14
59.11
best: 90.4 (AutoIF (Llama3 70B))
Instruction-Following Evaluation for Large Language Models arXiv:2311.07911
Instruction FollowingonIFEval
Inst-level strict-accuracy· 2023-11-14
55.76
best: 86.7 (AutoIF (Llama3 70B))
Instruction-Following Evaluation for Large Language Models arXiv:2311.07911
Instruction FollowingonIFEval
Prompt-level loose-accuracy· 2023-11-14
46.95
best: 85.6 (AutoIF (Llama3 70B))
Instruction-Following Evaluation for Large Language Models arXiv:2311.07911
Instruction FollowingonIFEval
Prompt-level strict-accuracy· 2023-11-14
43.07
best: 80.2 (AutoIF (Llama3 70B))
Instruction-Following Evaluation for Large Language Models arXiv:2311.07911