Metric: Average Score on VLM2-bench (9 subtasks) (higher is better)
| # | Model↕ | Average Score on VLM2-bench (9 subtasks)▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GPT-4o | 60.36 | No | GPT-4o System Card | 2024-10-25 | - |
| 2 | Qwen2.5-VL-7B | 54.82 | No | Qwen2.5-VL Technical Report | 2025-02-19 | Code |
| 3 | InternVL2.5-26B | 45.59 | No | Expanding Performance Boundaries of Open-Source ... | 2024-12-06 | Code |
| 4 | LLaVA-Video-7B | 43.32 | No | Video Instruction Tuning With Synthetic Data | 2024-10-03 | - |
| 5 | Qwen2-VL-7B | 42.37 | No | Qwen2-VL: Enhancing Vision-Language Model's Perc... | 2024-09-18 | Code |
| 6 | InternVL2.5-8B | 41.23 | No | Expanding Performance Boundaries of Open-Source ... | 2024-12-06 | Code |
| 7 | LLaVA-OneVision-7B | 39.35 | No | LLaVA-OneVision: Easy Visual Task Transfer | 2024-08-06 | Code |
| 8 | mPLUG-Owl3-7B | 37.85 | No | mPLUG-Owl3: Towards Long Image-Sequence Understa... | 2024-08-09 | Code |
| 9 | LongVA-7B | 22.59 | No | Long Context Transfer from Language to Vision | 2024-06-24 | Code |