ROSITA (Flickr30k)
Reported on 3 benchmarks across 1 task
Note: results are matched by exact model name. Different papers may use the same name for different model variants.
Reasoning3 results
- Group Score12.25best: 58.75 (GPT-4V (CoT, pick b/w two options))
- Image Score15.25best: 68.75 (GPT-4V (CoT, pick b/w two options))
- Text Score35.25best: 75.5 (GPT-4o + CA)