Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
MMR total
/
MRR-Benchmark
MMR total on MRR-Benchmark
Metric: Total Column Score (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Total Column Score (best first)
Total Column Score (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Total Column Score
▼
Extra Data
Paper
Date
↕
Code
1
Claude 3.5 Sonnet
463
Yes
-
-
-
2
GPT-4o
457
Yes
GPT-4o: Visual perception performance of multimo...
2024-06-14
-
3
GPT-4V
415
Yes
The Dawn of LMMs: Preliminary Explorations with ...
2023-09-29
Code
4
LLaVA-NEXT-34B
412
Yes
Visual Instruction Tuning
2023-04-17
Code
5
Phi-3-Vision
397
Yes
Phi-3 Technical Report: A Highly Capable Languag...
2024-04-22
-
6
InternVL2-8B
368
Yes
InternVL: Scaling up Vision Foundation Models an...
2023-12-21
Code
7
Qwen-vl-max
366
Yes
Qwen-VL: A Versatile Vision-Language Model for U...
2023-08-24
Code
8
LLaVA-NEXT-13B
335
Yes
Visual Instruction Tuning
2023-04-17
Code
9
Qwen-vl-plus
310
Yes
Qwen-VL: A Versatile Vision-Language Model for U...
2023-08-24
Code
10
Idefics-2-8B
256
Yes
What matters when building vision-language models?
2024-05-03
-
11
LLaVA-1.5-13B
243
Yes
Visual Instruction Tuning
2023-04-17
Code
12
InternVL2-1B
237
Yes
InternVL: Scaling up Vision Foundation Models an...
2023-12-21
Code
13
Monkey-Chat-7B
214
Yes
Monkey: Image Resolution and Text Label Are Impo...
2023-11-11
Code
14
Idefics-80B
139
Yes
OBELICS: An Open Web-Scale Filtered Dataset of I...
2023-06-21
Code
#1
Claude 3.5 Sonnet
463
Total Column Score
· Extra Data
No paper
#2
GPT-4o
SOTA
457
Total Column Score
· Extra Data
· 2024-06-14
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding
#3
GPT-4V
SOTA
415
Total Column Score
· Extra Data
· 2023-09-29
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
Code
#4
LLaVA-NEXT-34B
SOTA
412
Total Column Score
· Extra Data
· 2023-04-17
Visual Instruction Tuning
Code
#5
Phi-3-Vision
397
Total Column Score
· Extra Data
· 2024-04-22
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
#6
InternVL2-8B
368
Total Column Score
· Extra Data
· 2023-12-21
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Code
#7
Qwen-vl-max
366
Total Column Score
· Extra Data
· 2023-08-24
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Code
#8
LLaVA-NEXT-13B
335
Total Column Score
· Extra Data
· 2023-04-17
Visual Instruction Tuning
Code
#9
Qwen-vl-plus
310
Total Column Score
· Extra Data
· 2023-08-24
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Code
#10
Idefics-2-8B
256
Total Column Score
· Extra Data
· 2024-05-03
What matters when building vision-language models?
#11
LLaVA-1.5-13B
243
Total Column Score
· Extra Data
· 2023-04-17
Visual Instruction Tuning
Code
#12
InternVL2-1B
237
Total Column Score
· Extra Data
· 2023-12-21
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Code
#13
Monkey-Chat-7B
214
Total Column Score
· Extra Data
· 2023-11-11
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Code
#14
Idefics-80B
139
Total Column Score
· Extra Data
· 2023-06-21
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Code