Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Visual Question Answering (VQA)
/
VQA v2 val
Visual Question Answering (VQA) on VQA v2 val
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
BLIP-2 ViT-G OPT 6.7B (fine-tuned)
82.19
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
2
BLIP-2 ViT-G OPT 2.7B (fine-tuned)
81.59
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
3
BLIP-2 ViT-G FlanT5 XL (fine-tuned)
81.55
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
4
BLIP-2 ViT-G FlanT5 XXL (zero-shot)
65.2
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
5
PNP-VQA
63.3
No
Plug-and-Play VQA: Zero-shot VQA by Conjoining L...
2022-10-17
Code
6
BLIP-2 ViT-G FlanT5 XL (zero-shot)
63.1
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
7
BLIP-2 ViT-L FlanT5 XL (zero-shot)
62.6
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
8
LocVLM-L
55.9
No
Learning to Localize Objects Improves Spatial Re...
2024-04-11
Code
9
BLIP-2 ViT-G OPT 6.7B (zero-shot)
54.3
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
10
BLIP-2 ViT-G OPT 2.7B (zero-shot)
53.5
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
11
BLIP-2 ViT-L OPT 2.7B (zero-shot)
50.1
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
12
Few VLM (zero-shot)
47.7
No
A Good Prompt Is Worth Millions of Parameters: L...
2021-10-16
Code
13
MetaLM
41.1
No
Language Models are General-Purpose Interfaces
2022-06-13
Code
14
VLKD(ViT-B/16)
38.6
No
-
-
-
15
Frozen
29.5
No
Multimodal Few-Shot Learning with Frozen Languag...
2021-06-25
-