Instruction Following on LLaVA-Bench

Metric: avg score (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	avg score▼	Extra Data	Paper	Date↕	Code
1	CuMo-7B	85.7	No	CuMo: Scaling Multimodal LLM with Co-Upcycled Mi...	2024-05-09	Code
2	ShareGPT4V-13B	79.9	No	ShareGPT4V: Improving Large Multi-Modal Models w...	2023-11-21	Code
3	ShareGPT4V-7B	72.6	No	ShareGPT4V: Improving Large Multi-Modal Models w...	2023-11-21	Code
4	LLaVA-v1.5-13B	70.7	No	Improved Baselines with Visual Instruction Tuning	2023-10-05	Code
5	LLaVA-v1.5-7B	63.4	No	Improved Baselines with Visual Instruction Tuning	2023-10-05	Code
6	InstructBLIP-7B	60.9	No	InstructBLIP: Towards General-purpose Vision-Lan...	2023-05-11	Code
7	InstructBLIP-13B	58.2	No	InstructBLIP: Towards General-purpose Vision-Lan...	2023-05-11	Code
8	BLIP-2	38.1	No	BLIP-2: Bootstrapping Language-Image Pre-trainin...	2023-01-30	Code

#1CuMo-7BSOTA
85.7
avg score· 2024-05-09
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts Code
#2ShareGPT4V-13BSOTA
79.9
avg score· 2023-11-21
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions Code
#3ShareGPT4V-7B
72.6
avg score· 2023-11-21
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions Code
#4LLaVA-v1.5-13BSOTA
70.7
avg score· 2023-10-05
Improved Baselines with Visual Instruction Tuning Code
#5LLaVA-v1.5-7B
63.4
avg score· 2023-10-05
Improved Baselines with Visual Instruction Tuning Code
#6InstructBLIP-7BSOTA
60.9
avg score· 2023-05-11
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning Code
#7InstructBLIP-13B
58.2
avg score· 2023-05-11
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning Code
#8BLIP-2SOTA
38.1
avg score· 2023-01-30
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Code