Metric: Accuracy (higher is better)
| # | Model↕ | Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | BIMBA-LLaVA-Qwen2-7B | 77.88 | No | BIMBA: Selective-Scan Compression for Long-Range... | 2025-03-12 | Code |
| 2 | Gemini | 66.7 | No | Gemini 1.5: Unlocking multimodal understanding a... | 2024-03-08 | Code |
| 3 | LLaVA-OneVision-72B | 58.7 | No | LLaVA-OneVision: Easy Visual Task Transfer | 2024-08-06 | Code |
| 4 | LLaVA-OneVision-7B | 51.8 | No | LLaVA-OneVision: Easy Visual Task Transfer | 2024-08-06 | Code |
| 5 | Qwen2-VL-7B | 33.9 | No | Qwen2-VL: Enhancing Vision-Language Model's Perc... | 2024-09-18 | Code |
| 6 | LLaVA-NeXT-Video-7B | 20.1 | No | LLaVA-NeXT-Interleave: Tackling Multi-image, Vid... | 2024-07-10 | Code |
| 7 | VideoChat2 | 12.4 | No | VideoChat: Chat-Centric Video Understanding | 2023-05-10 | Code |
| 8 | VideoLLaMA2 | 4.5 | No | VideoLLaMA 2: Advancing Spatial-Temporal Modelin... | 2024-06-11 | Code |
| 9 | VideoChatGPT | 4.1 | No | Video-ChatGPT: Towards Detailed Video Understand... | 2023-06-08 | Code |