Metric: Spatial Understanding (higher is better)
| # | Model↕ | Spatial Understanding▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | VideoGPT+ | 2.8 | No | VideoGPT+: Integrating Image and Video Encoders ... | 2024-06-13 | Code |
| 2 | VideoChat2 | 2.43 | No | MVBench: A Comprehensive Multi-modal Video Under... | 2023-11-28 | Code |
| 3 | Chat-UniVi | 2.36 | No | Chat-UniVi: Unified Visual Representation Empowe... | 2023-11-14 | Code |
| 4 | BT-Adapter | 2.35 | No | BT-Adapter: Video Conversation is Feasible Witho... | 2023-09-27 | Code |
| 5 | VTimeLLM | 2.29 | No | VTimeLLM: Empower LLM to Grasp Video Moments | 2023-11-30 | Code |
| 6 | Video-ChatGPT | 2.25 | No | Video-ChatGPT: Towards Detailed Video Understand... | 2023-06-08 | Code |