GPT-4V (CoT, pick b/w two options)

Reported on 3 benchmarks across 1 task · 1 paper · 3 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Reasoning3 results

Visual ReasoningonWinoground
Group Score· 2023-11-15
58.75
SOTA
The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task arXiv:2311.09193
Visual ReasoningonWinoground
Image Score· 2023-11-15
68.75
SOTA
The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task arXiv:2311.09193
Visual ReasoningonWinoground
Text Score· 2023-11-15
75.25
best: 75.5 (GPT-4o + CA)
SOTA
The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task arXiv:2311.09193