MMR total on MRR-Benchmark

Metric: Total Column Score (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	Total Column Score▼	Extra Data	Paper	Date↕	Code
1	Claude 3.5 Sonnet	463	Yes	-	-	-
2	GPT-4o	457	Yes	GPT-4o: Visual perception performance of multimo...	2024-06-14	-
3	GPT-4V	415	Yes	The Dawn of LMMs: Preliminary Explorations with ...	2023-09-29	Code
4	LLaVA-NEXT-34B	412	Yes	Visual Instruction Tuning	2023-04-17	Code
5	Phi-3-Vision	397	Yes	Phi-3 Technical Report: A Highly Capable Languag...	2024-04-22	-
6	InternVL2-8B	368	Yes	InternVL: Scaling up Vision Foundation Models an...	2023-12-21	Code
7	Qwen-vl-max	366	Yes	Qwen-VL: A Versatile Vision-Language Model for U...	2023-08-24	Code
8	LLaVA-NEXT-13B	335	Yes	Visual Instruction Tuning	2023-04-17	Code
9	Qwen-vl-plus	310	Yes	Qwen-VL: A Versatile Vision-Language Model for U...	2023-08-24	Code
10	Idefics-2-8B	256	Yes	What matters when building vision-language models?	2024-05-03	-
11	LLaVA-1.5-13B	243	Yes	Visual Instruction Tuning	2023-04-17	Code
12	InternVL2-1B	237	Yes	InternVL: Scaling up Vision Foundation Models an...	2023-12-21	Code
13	Monkey-Chat-7B	214	Yes	Monkey: Image Resolution and Text Label Are Impo...	2023-11-11	Code
14	Idefics-80B	139	Yes	OBELICS: An Open Web-Scale Filtered Dataset of I...	2023-06-21	Code

#1Claude 3.5 Sonnet
463
Total Column Score· Extra Data
No paper
#2GPT-4oSOTA
457
Total Column Score· Extra Data· 2024-06-14
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding
#3GPT-4VSOTA
415
Total Column Score· Extra Data· 2023-09-29
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)Code
#4LLaVA-NEXT-34BSOTA
412
Total Column Score· Extra Data· 2023-04-17
Visual Instruction Tuning Code
#5Phi-3-Vision
397
Total Column Score· Extra Data· 2024-04-22
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
#6InternVL2-8B
368
Total Column Score· Extra Data· 2023-12-21
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Code
#7Qwen-vl-max
366
Total Column Score· Extra Data· 2023-08-24
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond Code
#8LLaVA-NEXT-13B
335
Total Column Score· Extra Data· 2023-04-17
Visual Instruction Tuning Code
#9Qwen-vl-plus
310
Total Column Score· Extra Data· 2023-08-24
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond Code
#10Idefics-2-8B
256
Total Column Score· Extra Data· 2024-05-03
What matters when building vision-language models?
#11LLaVA-1.5-13B
243
Total Column Score· Extra Data· 2023-04-17
Visual Instruction Tuning Code
#12InternVL2-1B
237
Total Column Score· Extra Data· 2023-12-21
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Code
#13Monkey-Chat-7B
214
Total Column Score· Extra Data· 2023-11-11
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models Code
#14Idefics-80B
139
Total Column Score· Extra Data· 2023-06-21
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents Code