Claude 3.5 Sonnet
Reported on 8 benchmarks across 4 tasks
Note: results are matched by exact model name. Different papers may use the same name for different model variants.
Natural Language Processing6 results
- Group Score10.6best: 35 (GPT-4o (CoT))
- Text Score32.8best: 59.2 (GPT-4o (CoT))
- Video Score28.8best: 51 (GPT-4o (CoT))
- Group Score10.6best: 35 (GPT-4o (CoT))
- Text Score32.8best: 59.2 (GPT-4o (CoT))
- Video Score28.8best: 51 (GPT-4o (CoT))
Knowledge Base1 result
- Accuracy0.01best: 0.252 (o3)
Computer Vision1 result
- Total Column Score· uses extra data463