Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Visual Question Answering (VQA)
/
OK-VQA
Visual Question Answering (VQA) on OK-VQA
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
PaLI-X-VPD
66.8
No
Visual Program Distillation: Distilling Tools an...
2023-12-05
-
2
PaLM-E-562B
66.1
No
PaLM-E: An Embodied Multimodal Language Model
2023-03-06
Code
3
PaLI-X (Single-task FT)
66.1
No
PaLI-X: On Scaling up a Multilingual Vision and ...
2023-05-29
Code
4
PaLI 17B
64.5
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
5
Prophet
62.5
No
Prophet: Prompting Large Language Models with Co...
2023-03-03
Code
6
RA-VQA-v2 (BLIP 2)
62.08
No
Fine-grained Late-interaction Multi-modal Retrie...
2023-09-29
Code
7
A Simple Baseline for KB-VQA
61.2
No
A Simple Baseline for Knowledge-Based Visual Que...
2023-10-20
-
8
PromptCap
60.4
No
PromptCap: Prompt-Guided Task-Aware Image Captio...
2022-11-15
Code
9
ReVeaL WIT + CC12M + Wikidata + VQA-2
59.1
No
REVEAL: Retrieval-Augmented Visual-Language Pre-...
2022-12-10
Code
10
Lyrics
58.2
No
Lyrics: Boosting Fine-grained Language-Vision Al...
2023-12-08
-
11
REVIVE (Ensemble)
58
No
REVIVE: Regional Visual Representation Matters i...
2022-06-02
Code
12
REVIVE (Single)
56.6
No
REVIVE: Regional Visual Representation Matters i...
2022-06-02
Code
13
RA-VQA-v2 (T5-large)
54.85
No
Fine-grained Late-interaction Multi-modal Retrie...
2023-09-29
Code
14
RA-VQA (T5-large)
54.48
No
Retrieval Augmented Visual Question Answering wi...
2022-10-07
Code
15
VK-OOD
52.4
No
-
-
Code
16
VK-OOD
52.4
No
-
-
Code
17
RA-VQA-FrDPR (T5-large)
51.22
No
Retrieval Augmented Visual Question Answering wi...
2022-10-07
Code
18
Flamingo80B
50.6
No
Flamingo: a Visual Language Model for Few-Shot L...
2022-04-29
Code
19
TRiG (T5-Large)
50.5
No
-
-
-
20
HYDRA
48.6
No
HYDRA: A Hyper Agent for Dynamic Compositional V...
2024-03-19
Code
21
PICa
48
Yes
An Empirical Study of GPT-3 for Few-Shot Knowled...
2021-09-10
Code
22
LaKo
47.01
No
LaKo: Knowledge-driven Visual Question Answering...
2022-07-26
Code
23
BLIP-2 ViT-G FlanT5 XXL (zero-shot)
45.9
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
24
Flamingo9B
44.7
No
Flamingo: a Visual Language Model for Few-Shot L...
2022-04-29
Code
25
VLC-BERT
43.1
No
VLC-BERT: Visual Question Answering with Context...
2022-10-24
Code
26
T5(Tan and Bansal, 2019) + Prefixes
42.03
No
LaKo: Knowledge-driven Visual Question Answering...
2022-07-26
Code
27
Flamingo3B
41.2
No
Flamingo: a Visual Language Model for Few-Shot L...
2022-04-29
Code
28
BLIP-2 ViT-G FlanT5 XL (zero-shot)
40.7
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
29
BLIP-2 ViT-L FlanT5 XL (zero-shot)
39.4
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
30
BLIP-2 ViT-G OPT 6.7B (zero-shot)
36.4
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
31
PNP-VQA
35.9
No
Plug-and-Play VQA: Zero-shot VQA by Conjoining L...
2022-10-17
Code
32
BLIP-2 ViT-G OPT 2.7B (zero-shot)
31.7
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
33
BLIP-2 ViT-L OPT 2.7B (zero-shot)
30.2
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
34
FewVLM
16.5
No
A Good Prompt Is Worth Millions of Parameters: L...
2021-10-16
Code
35
MetaLM
11.4
No
Language Models are General-Purpose Interfaces
2022-06-13
Code
36
VLKD(ViT-B/16)
10.5
No
-
-
-
37
Frozen
5.9
No
Multimodal Few-Shot Learning with Frozen Languag...
2021-06-25
-