Visual Question Answering (VQA) on A-OKVQA

Metric: DA VQA Score (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	DA VQA Score▼	Extra Data	Paper	Date↕	Code
1	SMoLA-PaLI-X Specialist Model	70.55	Yes	Omni-SMoLA: Boosting Generalist Multimodal Model...	2023-12-01	-
2	PaLI-X-VPD	68.2	No	Visual Program Distillation: Distilling Tools an...	2023-12-05	-
3	PromptCap	59.6	No	PromptCap: Prompt-Guided Task-Aware Image Captio...	2022-11-15	Code
4	Prophet	58.5	No	Prophet: Prompting Large Language Models with Co...	2023-03-03	Code
5	A Simple Baseline for KB-VQA	57.5	No	A Simple Baseline for Knowledge-Based Visual Que...	2023-10-20	-
6	KRISP	42.2	No	KRISP: Integrating Implicit and Symbolic Knowled...	2020-12-20	-
7	GPV-2	40.7	No	Webly Supervised Concept Expansion for General P...	2022-02-04	-
8	VLC-BERT	38.05	No	VLC-BERT: Visual Question Answering with Context...	2022-10-24	Code
9	LXMERT	25.9	No	LXMERT: Learning Cross-Modality Encoder Represen...	2019-08-20	Code
10	ViLBERT	25.9	No	ViLBERT: Pretraining Task-Agnostic Visiolinguist...	2019-08-06	Code
11	Pythia	21.9	No	Pythia v0.1: the Winning Entry to the VQA Challe...	2018-07-26	Code
12	ViLBERT - VQA	12	No	ViLBERT: Pretraining Task-Agnostic Visiolinguist...	2019-08-06	Code
13	ViLBERT - OK-VQA	9.2	No	ViLBERT: Pretraining Task-Agnostic Visiolinguist...	2019-08-06	Code

#1SMoLA-PaLI-X Specialist ModelSOTA
70.55
DA VQA Score· Extra Data· 2023-12-01
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
#2PaLI-X-VPD
68.2
DA VQA Score· 2023-12-05
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
#3PromptCapSOTA
59.6
DA VQA Score· 2022-11-15
PromptCap: Prompt-Guided Task-Aware Image Captioning Code
#4Prophet
58.5
DA VQA Score· 2023-03-03
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering Code
#5A Simple Baseline for KB-VQA
57.5
DA VQA Score· 2023-10-20
A Simple Baseline for Knowledge-Based Visual Question Answering
#6KRISPSOTA
42.2
DA VQA Score· 2020-12-20
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
#7GPV-2
40.7
DA VQA Score· 2022-02-04
Webly Supervised Concept Expansion for General Purpose Vision Models
#8VLC-BERT
38.05
DA VQA Score· 2022-10-24
VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge Code
#9LXMERT
25.9
DA VQA Score· 2019-08-20
LXMERT: Learning Cross-Modality Encoder Representations from Transformers Code
#10ViLBERTSOTA
25.9
DA VQA Score· 2019-08-06
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks Code
#11PythiaSOTA
21.9
DA VQA Score· 2018-07-26
Pythia v0.1: the Winning Entry to the VQA Challenge 2018 Code
#12ViLBERT - VQA
12
DA VQA Score· 2019-08-06
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks Code
#13ViLBERT - OK-VQA
9.2
DA VQA Score· 2019-08-06
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks Code