TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Reasoning/Visual Reasoning/NLVR2 Test

Visual Reasoning on NLVR2 Test

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1BEiT-392.58NoImage as a Foreign Language: BEiT Pretraining fo...2022-08-22Code
2X2-VLM (large)89.4NoX$^2$-VLM: All-In-One Pre-trained Model For Visi...2022-11-22Code
3XFM (base)88.4NoToward Building General Foundation Models for La...2023-01-12Code
4CoCa87NoCoCa: Contrastive Captioners are Image-Text Foun...2022-05-04Code
5X2-VLM (base)87NoX$^2$-VLM: All-In-One Pre-trained Model For Visi...2022-11-22Code
6VLMo86.86NoVLMo: Unified Vision-Language Pre-Training with ...2021-11-03Code
7SimVLM85.15NoSimVLM: Simple Visual Language Model Pretraining...2021-08-24Code
8X-VLM (base)84.76NoMulti-Grained Vision Language Pre-Training: Alig...2021-11-16Code
9BLIP-129M83.09NoBLIP: Bootstrapping Language-Image Pre-training ...2022-01-28Code
10ALBEF (14M)82.55NoAlign before Fuse: Vision and Language Represent...2021-07-16Code
11UNITER (Large)79.5NoUNITER: UNiversal Image-TExt Representation Lear...2019-09-25Code
12SOHO77.32NoSeeing Out of tHe bOx: End-to-End Pre-training f...2021-04-07Code
13LXMERT76.2NoLXMERT: Learning Cross-Modality Encoder Represen...2019-08-20Code
14ViLT-B/3276.13NoViLT: Vision-and-Language Transformer Without Co...2021-02-05Code