Visual Question Answering (VQA) on VQA v2 test-std

Metric: other (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	other▼	Extra Data	Paper	Date↕	Code
1	mPLUG-Huge	77.02	No	mPLUG: Effective and Efficient Vision-Language L...	2022-05-24	Code
2	ONE-PEACE	74.15	No	ONE-PEACE: Exploring One General Representation ...	2023-05-18	Code
3	OFA	73.35	No	OFA: Unifying Architectures, Tasks, and Modaliti...	2022-02-07	Code
4	VLMo	72.87	No	VLMo: Unified Vision-Language Pre-Training with ...	2021-11-03	Code
5	Prismer	69.7	No	Prismer: A Vision-Language Model with Multi-Task...	2023-03-04	Code
6	MSR + MS Cog. Svcs., X10 models	67.87	No	VinVL: Revisiting Visual Representations in Visi...	2021-01-02	Code
7	MSR + MS Cog. Svcs.	66.68	No	VinVL: Revisiting Visual Representations in Visi...	2021-01-02	Code
8	BGN, ensemble	66.28	No	Bilinear Graph Networks for Visual Question Answ...	2019-07-23	-
9	ERNIE-ViL-single model	65.24	No	ERNIE-ViL: Knowledge Enhanced Vision-Language Re...	2020-06-30	-
10	Single, w/o VLP	64.77	No	In Defense of Grid Features for Visual Question ...	2020-01-10	Code
11	Single, w/o VLP	63.78	No	Deep Multimodal Neural Architecture Search	2020-04-25	Code

#1mPLUG-HugeSOTA
77.02
other· 2022-05-24
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections Code
#2ONE-PEACE
74.15
other· 2023-05-18
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities Code
#3OFASOTA
73.35
other· 2022-02-07
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework Code
#4VLMoSOTA
72.87
other· 2021-11-03
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts Code
#5Prismer
69.7
other· 2023-03-04
Prismer: A Vision-Language Model with Multi-Task Experts Code
#6MSR + MS Cog. Svcs., X10 modelsSOTA
67.87
other· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models Code
#7MSR + MS Cog. Svcs.
66.68
other· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models Code
#8BGN, ensembleSOTA
66.28
other· 2019-07-23
Bilinear Graph Networks for Visual Question Answering
#9ERNIE-ViL-single model
65.24
other· 2020-06-30
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
#10Single, w/o VLP
64.77
other· 2020-01-10
In Defense of Grid Features for Visual Question Answering Code
#11Single, w/o VLP
63.78
other· 2020-04-25
Deep Multimodal Neural Architecture Search Code