Phi-3.5-Vision

Reported on 6 benchmarks across 2 tasks

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing6 results

Relation ExtractiononVinoground
Group Score
6.2
best: 35 (GPT-4o (CoT))
Relation ExtractiononVinoground
Text Score
24
best: 59.2 (GPT-4o (CoT))
Relation ExtractiononVinoground
Video Score
22.4
best: 51 (GPT-4o (CoT))
Temporal Relation ExtractiononVinoground
Group Score
6.2
best: 35 (GPT-4o (CoT))
Temporal Relation ExtractiononVinoground
Text Score
24
best: 59.2 (GPT-4o (CoT))
Temporal Relation ExtractiononVinoground
Video Score
22.4
best: 51 (GPT-4o (CoT))