Claude 3 Opus (5-shot)

Reported on 2 benchmarks across 2 tasks

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing2 results

Question AnsweringonPubMedQA
Accuracy
75.8
best: 81.6 (Meditron-70B (CoT + SC))
Common Sense ReasoningonWinoGrande
Accuracy
88.5
best: 96.1 (ST-MoE-32B 269B (fine-tuned))