CLIP (ViT-L/14)

Reported on 2 benchmarks across 1 task · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Reasoning2 results

Visual ReasoningonWinoground
Image Score· 2023-11-17
8
best: 68.75 (GPT-4V (CoT, pick b/w two options))
SelfEval: Leveraging the discriminative nature of generative models for evaluation arXiv:2311.10708
Visual ReasoningonWinoground
Text Score· 2023-11-17
30.25
best: 75.5 (GPT-4o + CA)
SelfEval: Leveraging the discriminative nature of generative models for evaluation arXiv:2311.10708