TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Visual Question Answering (VQA)/VCR (Q-AR) test

Visual Question Answering (VQA) on VCR (Q-AR) test

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1GPT4RoI81.6NoGPT4RoI: Instruction Tuning Large Language Model...2023-07-07Code
2ERNIE-ViL-large(ensemble of 15 models)70.5NoERNIE-ViL: Knowledge Enhanced Vision-Language Re...2020-06-30-
3UNITER (Large)62.8NoUNITER: UNiversal Image-TExt Representation Lear...2019-09-25Code
4KVL-BERTLARGE60.3NoKVL-BERT: Knowledge Enhanced Visual-and-Linguist...2020-12-13-
5VL-BERTLARGE59.7NoVL-BERT: Pre-training of Generic Visual-Linguist...2019-08-22Code
6VL-T558.9NoUnifying Vision-and-Language Tasks via Text Gene...2021-02-04Code
7VisualBERT52.4NoVisualBERT: A Simple and Performant Baseline for...2019-08-09Code