Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Visual Question Answering (VQA)
/
GQA test-dev
Visual Question Answering (VQA) on GQA test-dev
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
CFR
72.1
No
Coarse-to-Fine Reasoning for Visual Question Ans...
2021-10-06
Code
2
PaLI-X-VPD
67.3
No
Visual Program Distillation: Distilling Tools an...
2023-12-05
-
3
CuMo-7B
64.9
Yes
CuMo: Scaling Multimodal LLM with Co-Upcycled Mi...
2024-05-09
Code
4
Video-LaVIT
64.4
No
Video-LaVIT: Unified Video-Language Pre-training...
2024-02-05
Code
5
NSM
62.95
No
Learning by Abstraction: The Neural State Machine
2019-07-09
Code
6
Lyrics
62.4
No
Lyrics: Boosting Fine-grained Language-Vision Al...
2023-12-08
-
7
LXMERT (Pre-train + scratch)
60
No
LXMERT: Learning Cross-Modality Encoder Represen...
2019-08-20
Code
8
single-hop + LCGN (ours)
55.8
No
Language-Conditioned Graph Networks for Relation...
2019-05-10
Code
9
HYDRA
47.9
No
HYDRA: A Hyper Agent for Dynamic Compositional V...
2024-03-19
Code
10
BLIP-2 ViT-G FlanT5 XXL (zero-shot)
44.7
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
11
BLIP-2 ViT-L FlanT5 XL (zero-shot)
44.4
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
12
BLIP-2 ViT-G FlanT5 XL (zero-shot)
44.2
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
13
PNP-VQA
41.9
No
Plug-and-Play VQA: Zero-shot VQA by Conjoining L...
2022-10-17
Code
14
BLIP-2 ViT-G OPT 6.7B (zero-shot)
36.4
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
15
BLIP-2 ViT-G OPT 2.7B (zero-shot)
34.6
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
16
BLIP-2 ViT-L OPT 2.7B (zero-shot)
33.9
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
17
FewVLM (zero-shot)
29.3
No
A Good Prompt Is Worth Millions of Parameters: L...
2021-10-16
Code