TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Reasoning/Visual Reasoning/Winoground

Visual Reasoning on Winoground

Metric: Group Score (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Group Score▼Extra DataPaperDate↕Code
1GPT-4V (CoT, pick b/w two options)58.75NoThe Role of Chain-of-Thought in Complex Vision-L...2023-11-15-
2GPT-4o + CA52NoA Cognitive Paradigm Approach to Probe the Perce...2025-01-23-
3MMICL + CoCoT50.75NoCoCoT: Contrastive Chain-of-Thought Prompting fo...2024-01-05Code
4MMICL + CCoT47.5NoCoCoT: Contrastive Chain-of-Thought Prompting fo...2024-01-05Code
5GPT-4V + CoCoT44.5NoCoCoT: Contrastive Chain-of-Thought Prompting fo...2024-01-05Code
6MMICL (FLAN-T5-XXL)43NoMMICL: Empowering Vision-language Model with Mul...2023-09-14Code
7OpenFlamingo + CoCoT41.5NoCoCoT: Contrastive Chain-of-Thought Prompting fo...2024-01-05Code
8GPT-4V (pick b/w two options)39.25NoThe Role of Chain-of-Thought in Complex Vision-L...2023-11-15-
9OpenFlamingo + DDCoT39NoCoCoT: Contrastive Chain-of-Thought Prompting fo...2024-01-05Code
10GPT-4V (image-caption match answer yes/no, zero-shot)38No---
11GPT-4V37.75NoCoCoT: Contrastive Chain-of-Thought Prompting fo...2024-01-05Code
12MMICL + DDCoT36.75NoCoCoT: Contrastive Chain-of-Thought Prompting fo...2024-01-05Code
13OpenFlamingo33.25NoCoCoT: Contrastive Chain-of-Thought Prompting fo...2024-01-05Code
14VQ230.5NoWhat You See is What You Read? Improving Text-Im...2023-05-17Code
15PaLI (ft SNLI-VE + Synthetic Data)28.75NoWhat You See is What You Read? Improving Text-Im...2023-05-17Code
16PaLI (ft SNLI-VE)28.7NoWhat You See is What You Read? Improving Text-Im...2023-05-17Code
17Gemini + CoCoT27.75NoCoCoT: Contrastive Chain-of-Thought Prompting fo...2024-01-05Code
18FIBER (EqSim)27.5NoEquivariant Similarity for Vision-Language Found...2023-03-25Code
19Gemini25NoCoCoT: Contrastive Chain-of-Thought Prompting fo...2024-01-05Code
20Gemini + DDCoT23.75NoCoCoT: Contrastive Chain-of-Thought Prompting fo...2024-01-05Code
21BLIP2 (ft COCO)23.5NoWhat You See is What You Read? Improving Text-Im...2023-05-17Code
22BLIP2 (SGVL)23.3NoIncorporating Structured Representations into Pr...2023-05-10-
23FIBER (finetuned, Flickr30k)23NoEquivariant Similarity for Vision-Language Found...2023-03-25Code
24LLaVA-1.5-CCoT22.3NoCompositional Chain-of-Thought Prompting for Lar...2023-11-27Code
25FIBER22.25NoEquivariant Similarity for Vision-Language Found...2023-03-25Code
26X-VLM 4M21.5NoMeasuring Progress in Fine-grained Vision-and-La...2023-05-12Code
27BLIP (SGVL)21.5NoIncorporating Structured Representations into Pr...2023-05-10-
28X-VLM 16M21.2NoMeasuring Progress in Fine-grained Vision-and-La...2023-05-12Code
29Gemini + CCoT20.75NoCoCoT: Contrastive Chain-of-Thought Prompting fo...2024-01-05Code
30NegBLIP220.5NoIncorporating Structured Representations into Pr...2023-05-10-
31LLaVA-1.520.1NoCompositional Chain-of-Thought Prompting for Lar...2023-11-27Code
32OpenFlamingo + CCoT20NoCoCoT: Contrastive Chain-of-Thought Prompting fo...2024-01-05Code
33BLIP219NoIncorporating Structured Representations into Pr...2023-05-10-
34BLIP (+Graph Text, +Graph Neg)19NoIncorporating Structured Representations into Pr...2023-05-10-
35METER (EqSim)18.75NoEquivariant Similarity for Vision-Language Found...2023-03-25Code
36NegBLIP18.5NoIncorporating Structured Representations into Pr...2023-05-10-
37KeyComp* (GPT-4)18.2NoPrompting Large Vision-Language Models for Compo...2024-01-20Code
38OFA large (TLC-A)17.5NoSimple Token-Level Confidence Improves Caption C...2023-05-11-
39KeyComp* (GPT-3.5)17.4NoPrompting Large Vision-Language Models for Compo...2024-01-20Code
40BLIP (VisualGPTScore, α-tuned)16.8NoRevisiting the Role of Language Priors in Vision...2023-06-02Code
41Random chance16.67NoWinoground: Probing Vision and Language Models f...2022-04-07Code
42BLIP (+Graph Text)16.5NoIncorporating Structured Representations into Pr...2023-05-10-
43IAIS large (Flickr30k)16No---
44IAIS large (COCO)15.5No---
45BLIP15NoIncorporating Structured Representations into Pr...2023-05-10-
46METER (finetuned, Flickr30k)14.75NoEquivariant Similarity for Vision-Language Found...2023-03-25Code
47VinVL14.5NoWinoground: Probing Vision and Language Models f...2022-04-07Code
48BLIP 14M14.5NoMeasuring Progress in Fine-grained Vision-and-La...2023-05-12Code
49CACR base14.25No---
50FLAVA (ITM)14.25NoWinoground: Probing Vision and Language Models f...2022-04-07Code
51OFA base (TLC-A)13.75NoSimple Token-Level Confidence Improves Caption C...2023-05-11-
52BLIP (ITM)13.3NoRevisiting the Role of Language Priors in Vision...2023-06-02Code
53LLaVA13NoIncorporating Structured Representations into Pr...2023-05-10-
54ALBEF 14M12.7NoMeasuring Progress in Fine-grained Vision-and-La...2023-05-12Code
55KeyComp (GPT-3.5)12.4NoPrompting Large Vision-Language Models for Compo...2024-01-20Code
56LLaVA-1.5-ZS-CoT12.3NoCompositional Chain-of-Thought Prompting for Lar...2023-11-27Code
57ROSITA (Flickr30k)12.25No---
58BLIP 129M (CapFilt/L)12.2NoMeasuring Progress in Fine-grained Vision-and-La...2023-05-12Code
59BLIP-ViT/L 129M12.2NoMeasuring Progress in Fine-grained Vision-and-La...2023-05-12Code
60PEVL 14M12.2NoMeasuring Progress in Fine-grained Vision-and-La...2023-05-12Code
61METER12NoEquivariant Similarity for Vision-Language Found...2023-03-25Code
62BLIP 129M11.7NoMeasuring Progress in Fine-grained Vision-and-La...2023-05-12Code
63MiniGPT-4-7B (GPTScore)11.5NoAn Examination of the Compositionality of Large ...2023-08-21Code
64TIFA11.3NoWhat You See is What You Read? Improving Text-Im...2023-05-17Code
65ViLLA large11NoWinoground: Probing Vision and Language Models f...2022-04-07Code
66ALBEF 4M11NoMeasuring Progress in Fine-grained Vision-and-La...2023-05-12Code
67UNITER large10.5NoWinoground: Probing Vision and Language Models f...2022-04-07Code
68LLaVA-7B (GPTScore)10.5NoAn Examination of the Compositionality of Large ...2023-08-21Code
69CLIP RN50x6410.25NoWhat You See is What You Read? Improving Text-Im...2023-05-17Code
70UNITER base10NoWinoground: Probing Vision and Language Models f...2022-04-07Code
71CLIP (SGVL)9.8NoIncorporating Structured Representations into Pr...2023-05-10-
72syn-CLIP9.5NoGoing Beyond Nouns With Vision & Language Models...2023-03-30Code
73MiniGPT-49.5NoIncorporating Structured Representations into Pr...2023-05-10-
74MiniGPT-4-7B (VisualGPTScore)9.5NoAn Examination of the Compositionality of Large ...2023-08-21Code
75ViLT (ViT-B/32)9.25NoWinoground: Probing Vision and Language Models f...2022-04-07Code
76OFA large (ft SNLI-VE)9NoWhat You See is What You Read? Improving Text-Im...2023-05-17Code
77FLAVA (contrastive)9NoWinoground: Probing Vision and Language Models f...2022-04-07Code
78InstructBLIP-CCoT 8.3NoCompositional Chain-of-Thought Prompting for Lar...2023-11-27Code
79syn-CyCLIP8.25NoGoing Beyond Nouns With Vision & Language Models...2023-03-30Code
80COCA ViT-L14 (f.t on COCO)8.25NoWhat You See is What You Read? Improving Text-Im...2023-05-17Code
81CLIP (ViT-B/32)8NoWinoground: Probing Vision and Language Models f...2022-04-07Code
82ViLLA base8NoWinoground: Probing Vision and Language Models f...2022-04-07Code
83NegCLIP8NoIncorporating Structured Representations into Pr...2023-05-10-
84IDEFICS 80B8No---
85OFA large (ITM)7.25NoSimple Token-Level Confidence Improves Caption C...2023-05-11-
86CyCLIP7.25NoGoing Beyond Nouns With Vision & Language Models...2023-03-30Code
87OFA tiny (TLC-A)6.75NoSimple Token-Level Confidence Improves Caption C...2023-05-11-
88BLIP (ITC)6.5NoRevisiting the Role of Language Priors in Vision...2023-06-02Code
89OFA base (ITM)6.5NoSimple Token-Level Confidence Improves Caption C...2023-05-11-
90IDEFICS 9B5No---
91ViLBERT base4.75NoWinoground: Probing Vision and Language Models f...2022-04-07Code
92OFA tiny (ITM)4.5NoSimple Token-Level Confidence Improves Caption C...2023-05-11-
93VSE++ (Flickr30k, VGG)4.5NoWinoground: Probing Vision and Language Models f...2022-04-07Code
94VSE++ (COCO, ResNet)4NoWinoground: Probing Vision and Language Models f...2022-04-07Code
95UniT (ITM finetuned)4NoWinoground: Probing Vision and Language Models f...2022-04-07Code
96LXMERT4NoWinoground: Probing Vision and Language Models f...2022-04-07Code
97InstructBLIP-ZS-CoT4NoCompositional Chain-of-Thought Prompting for Lar...2023-11-27Code
98VSRN (COCO)3.75NoWinoground: Probing Vision and Language Models f...2022-04-07Code
99VSRN (Flickr30k)3.5NoWinoground: Probing Vision and Language Models f...2022-04-07Code
100VSE++ (COCO, VGG)3.5NoWinoground: Probing Vision and Language Models f...2022-04-07Code
101InstructBLIP3.3NoCompositional Chain-of-Thought Prompting for Lar...2023-11-27Code
102VSE++ (Flickr30k, ResNet)2.75NoWinoground: Probing Vision and Language Models f...2022-04-07Code
103MiniGPT-4-7B (BERTScore)2.75NoAn Examination of the Compositionality of Large ...2023-08-21Code
104LLaVA-7B (BERTScore)2.25NoAn Examination of the Compositionality of Large ...2023-08-21Code
105VisualBERT base1.5NoWinoground: Probing Vision and Language Models f...2022-04-07Code