TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Visual Question Answering (VQA)/VCR (Q-A) test

Visual Question Answering (VQA) on VCR (Q-A) test

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1GPT4RoI89.4NoGPT4RoI: Instruction Tuning Large Language Model...2023-07-07Code
2ERNIE-ViL-large(ensemble of 15 models)81.6NoERNIE-ViL: Knowledge Enhanced Vision-Language Re...2020-06-30-
3UNITER-large (10 ensemble)79.8NoUNITER: UNiversal Image-TExt Representation Lear...2019-09-25Code
4MAD (Single Model, Formerly CLIP-TD)79.6NoMultimodal Adaptive Distillation for Leveraging ...2022-04-22-
5UNITER (Large)77.3NoUNITER: UNiversal Image-TExt Representation Lear...2019-09-25Code
6KVL-BERTLARGE76.4NoKVL-BERT: Knowledge Enhanced Visual-and-Linguist...2020-12-13-
7VL-BERTLARGE75.8NoVL-BERT: Pre-training of Generic Visual-Linguist...2019-08-22Code
8VL-T575.3NoUnifying Vision-and-Language Tasks via Text Gene...2021-02-04Code
9VisualBERT71.6NoVisualBERT: A Simple and Performant Baseline for...2019-08-09Code
10OFA-X71.2NoHarnessing the Power of Multi-Task Pretraining f...2022-12-08Code
11OFA-X-MT62NoHarnessing the Power of Multi-Task Pretraining f...2022-12-08Code