Claude 3.5 Sonnet

Reported on 8 benchmarks across 4 tasks

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing6 results

Relation ExtractiononVinoground
Group Score
10.6
best: 35 (GPT-4o (CoT))
Relation ExtractiononVinoground
Text Score
32.8
best: 59.2 (GPT-4o (CoT))
Relation ExtractiononVinoground
Video Score
28.8
best: 51 (GPT-4o (CoT))
Temporal Relation ExtractiononVinoground
Group Score
10.6
best: 35 (GPT-4o (CoT))
Temporal Relation ExtractiononVinoground
Text Score
32.8
best: 59.2 (GPT-4o (CoT))
Temporal Relation ExtractiononVinoground
Video Score
28.8
best: 51 (GPT-4o (CoT))