Metric: avg score (higher is better)
| # | Model↕ | avg score▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | CuMo-7B | 85.7 | No | CuMo: Scaling Multimodal LLM with Co-Upcycled Mi... | 2024-05-09 | Code |
| 2 | ShareGPT4V-13B | 79.9 | No | ShareGPT4V: Improving Large Multi-Modal Models w... | 2023-11-21 | Code |
| 3 | ShareGPT4V-7B | 72.6 | No | ShareGPT4V: Improving Large Multi-Modal Models w... | 2023-11-21 | Code |
| 4 | LLaVA-v1.5-13B | 70.7 | No | Improved Baselines with Visual Instruction Tuning | 2023-10-05 | Code |
| 5 | LLaVA-v1.5-7B | 63.4 | No | Improved Baselines with Visual Instruction Tuning | 2023-10-05 | Code |
| 6 | InstructBLIP-7B | 60.9 | No | InstructBLIP: Towards General-purpose Vision-Lan... | 2023-05-11 | Code |
| 7 | InstructBLIP-13B | 58.2 | No | InstructBLIP: Towards General-purpose Vision-Lan... | 2023-05-11 | Code |
| 8 | BLIP-2 | 38.1 | No | BLIP-2: Bootstrapping Language-Image Pre-trainin... | 2023-01-30 | Code |