Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Visual Question Answering (VQA)
/
A-OKVQA
Visual Question Answering (VQA) on A-OKVQA
Metric: DA VQA Score (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
DA VQA Score (best first)
DA VQA Score (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
DA VQA Score
▼
Extra Data
Paper
Date
↕
Code
1
SMoLA-PaLI-X Specialist Model
70.55
Yes
Omni-SMoLA: Boosting Generalist Multimodal Model...
2023-12-01
-
2
PaLI-X-VPD
68.2
No
Visual Program Distillation: Distilling Tools an...
2023-12-05
-
3
PromptCap
59.6
No
PromptCap: Prompt-Guided Task-Aware Image Captio...
2022-11-15
Code
4
Prophet
58.5
No
Prophet: Prompting Large Language Models with Co...
2023-03-03
Code
5
A Simple Baseline for KB-VQA
57.5
No
A Simple Baseline for Knowledge-Based Visual Que...
2023-10-20
-
6
KRISP
42.2
No
KRISP: Integrating Implicit and Symbolic Knowled...
2020-12-20
-
7
GPV-2
40.7
No
Webly Supervised Concept Expansion for General P...
2022-02-04
-
8
VLC-BERT
38.05
No
VLC-BERT: Visual Question Answering with Context...
2022-10-24
Code
9
LXMERT
25.9
No
LXMERT: Learning Cross-Modality Encoder Represen...
2019-08-20
Code
10
ViLBERT
25.9
No
ViLBERT: Pretraining Task-Agnostic Visiolinguist...
2019-08-06
Code
11
Pythia
21.9
No
Pythia v0.1: the Winning Entry to the VQA Challe...
2018-07-26
Code
12
ViLBERT - VQA
12
No
ViLBERT: Pretraining Task-Agnostic Visiolinguist...
2019-08-06
Code
13
ViLBERT - OK-VQA
9.2
No
ViLBERT: Pretraining Task-Agnostic Visiolinguist...
2019-08-06
Code
#1
SMoLA-PaLI-X Specialist Model
SOTA
70.55
DA VQA Score
· Extra Data
· 2023-12-01
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
#2
PaLI-X-VPD
68.2
DA VQA Score
· 2023-12-05
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
#3
PromptCap
SOTA
59.6
DA VQA Score
· 2022-11-15
PromptCap: Prompt-Guided Task-Aware Image Captioning
Code
#4
Prophet
58.5
DA VQA Score
· 2023-03-03
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Code
#5
A Simple Baseline for KB-VQA
57.5
DA VQA Score
· 2023-10-20
A Simple Baseline for Knowledge-Based Visual Question Answering
#6
KRISP
SOTA
42.2
DA VQA Score
· 2020-12-20
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
#7
GPV-2
40.7
DA VQA Score
· 2022-02-04
Webly Supervised Concept Expansion for General Purpose Vision Models
#8
VLC-BERT
38.05
DA VQA Score
· 2022-10-24
VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge
Code
#9
LXMERT
25.9
DA VQA Score
· 2019-08-20
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Code
#10
ViLBERT
SOTA
25.9
DA VQA Score
· 2019-08-06
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Code
#11
Pythia
SOTA
21.9
DA VQA Score
· 2018-07-26
Pythia v0.1: the Winning Entry to the VQA Challenge 2018
Code
#12
ViLBERT - VQA
12
DA VQA Score
· 2019-08-06
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Code
#13
ViLBERT - OK-VQA
9.2
DA VQA Score
· 2019-08-06
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Code