Visual Question Answering on VQA v2 test-dev

Metric: Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

#	Model↕	Accuracy▼	Extra Data	Paper	Date↕	Code
1	BLIP-2 ViT-G OPT 6.7B (fine-tuned)	82.3	No	BLIP-2: Bootstrapping Language-Image Pre-trainin...	2023-01-30	Code
2	CoCa	82.3	No	CoCa: Contrastive Captioners are Image-Text Foun...	2022-05-04	Code
3	OFA	82	No	OFA: Unifying Architectures, Tasks, and Modaliti...	2022-02-07	Code
4	BLIP-2 ViT-G OPT 2.7B (fine-tuned)	81.74	No	BLIP-2: Bootstrapping Language-Image Pre-trainin...	2023-01-30	Code
5	BLIP-2 ViT-G FlanT5 XL (fine-tuned)	81.66	No	BLIP-2: Bootstrapping Language-Image Pre-trainin...	2023-01-30	Code
6	mPLUG-2	81.11	No	mPLUG-2: A Modularized Multi-modal Foundation Mo...	2023-02-01	Code
7	Florence	80.16	No	Florence: A New Foundation Model for Computer Vi...	2021-11-22	Code
8	Aurora (ours, r=64)	77.69	No	-	-	-
9	VK-OOD	76.8	No	Differentiable Outlier Detection Enable Robust D...	2023-02-11	Code
10	LXMERT (low-magnitude pruning)	70.72	No	LXMERT Model Compression for Visual Question Ans...	2023-10-23	Code
11	LocVLM-L	56.2	No	Learning to Localize Objects Improves Spatial Re...	2024-04-11	Code