Visual Question Answering (VQA) on InfiMM-Eval

Metric: Abductive (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

#	Model↕	Abductive▼	Extra Data	Paper	Date↕	Code
1	GPT-4V	77.88	No	GPT-4 Technical Report	2023-03-15	Code
2	SPHINX v2	49.85	No	SPHINX: The Joint Mixing of Weights, Tasks, and ...	2023-11-13	Code
3	LLaVA-1.5	47.91	No	Improved Baselines with Visual Instruction Tuning	2023-10-05	Code
4	CogVLM-Chat	47.88	No	CogVLM: Visual Expert for Pretrained Language Mo...	2023-11-06	Code
5	LLaMA-Adapter V2	46.12	No	LLaMA-Adapter V2: Parameter-Efficient Visual Ins...	2023-04-28	Code
6	Qwen-VL-Chat	44.39	No	Qwen-VL: A Versatile Vision-Language Model for U...	2023-08-24	Code
7	InstructBLIP	37.76	No	InstructBLIP: Towards General-purpose Vision-Lan...	2023-05-11	Code
8	Emu	36.57	No	Emu: Generative Pretraining in Multimodality	2023-07-11	Code
9	InternLM-XComposer-VL	35.97	No	InternLM-XComposer: A Vision-Language Large Mode...	2023-09-26	Code
10	Otter	33.64	No	Otter: A Multi-Modal Model with In-Context Instr...	2023-05-05	Code
11	mPLUG-Owl2	20.6	No	mPLUG-Owl2: Revolutionizing Multi-modal Large La...	2023-11-07	Code
12	BLIP-2-OPT2.7B	18.96	No	BLIP-2: Bootstrapping Language-Image Pre-trainin...	2023-01-30	Code
13	MiniGPT-v2	13.28	No	MiniGPT-4: Enhancing Vision-Language Understandi...	2023-04-20	Code
14	OpenFlamingo-v2	5.3	No	OpenFlamingo: An Open-Source Framework for Train...	2023-08-02	Code