Visual Question Answering (VQA) on InfiMM-Eval

Metric: Deductive (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

#	Model↕	Deductive▼	Extra Data	Paper	Date↕	Code
1	GPT-4V	74.86	No	GPT-4 Technical Report	2023-03-15	Code
2	SPHINX v2	42.17	No	SPHINX: The Joint Mixing of Weights, Tasks, and ...	2023-11-13	Code
3	Qwen-VL-Chat	37.55	No	Qwen-VL: A Versatile Vision-Language Model for U...	2023-08-24	Code
4	CogVLM-Chat	36.75	No	CogVLM: Visual Expert for Pretrained Language Mo...	2023-11-06	Code
5	LLaVA-1.5	30.94	No	Improved Baselines with Visual Instruction Tuning	2023-10-05	Code
6	Emu	28.9	No	Emu: Generative Pretraining in Multimodality	2023-07-11	Code
7	LLaMA-Adapter V2	28.7	No	LLaMA-Adapter V2: Parameter-Efficient Visual Ins...	2023-04-28	Code
8	InstructBLIP	27.56	No	InstructBLIP: Towards General-purpose Vision-Lan...	2023-05-11	Code
9	InternLM-XComposer-VL	26.77	No	InternLM-XComposer: A Vision-Language Large Mode...	2023-09-26	Code
10	mPLUG-Owl2	23.43	No	mPLUG-Owl2: Revolutionizing Multi-modal Large La...	2023-11-07	Code
11	Otter	22.49	No	Otter: A Multi-Modal Model with In-Context Instr...	2023-05-05	Code
12	MiniGPT-v2	11.02	No	MiniGPT-4: Enhancing Vision-Language Understandi...	2023-04-20	Code
13	OpenFlamingo-v2	8.88	No	OpenFlamingo: An Open-Source Framework for Train...	2023-08-02	Code
14	BLIP-2-OPT2.7B	2.76	No	BLIP-2: Bootstrapping Language-Image Pre-trainin...	2023-01-30	Code