Metric: Confidence score (lower is better)
| # | Model↕ | Confidence score▲ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Video Chat | 2.2 | No | VideoChat: Chat-Centric Video Understanding | 2023-05-10 | Code |
| 2 | Video-ChatGPT | 2.7 | No | Video-ChatGPT: Towards Detailed Video Understand... | 2023-06-08 | Code |
| 3 | LLaMA Adapter V2 | 2.7 | No | LLaMA-Adapter V2: Parameter-Efficient Visual Ins... | 2023-04-28 | Code |
| 4 | MovieChat | 3.1 | No | MovieChat: From Dense Token to Sparse Memory for... | 2023-07-31 | Code |
| 5 | VideoChat2 | 3.3 | No | MVBench: A Comprehensive Multi-modal Video Under... | 2023-11-28 | Code |
| 6 | LLaMA-VID-13B (2 Token) | 3.3 | No | LLaMA-VID: An Image is Worth 2 Tokens in Large L... | 2023-11-28 | Code |
| 7 | LLaMA-VID-7B (2 Token) | 3.3 | No | LLaMA-VID: An Image is Worth 2 Tokens in Large L... | 2023-11-28 | Code |
| 8 | Chat-UniVi-13B | 3.3 | No | Chat-UniVi: Unified Visual Representation Empowe... | 2023-11-14 | Code |
| 9 | Video-LLaVA | 3.3 | No | Video-LLaVA: Learning United Visual Representati... | 2023-11-16 | Code |
| 10 | BT-Adapter (zero-shot) | 3.6 | No | BT-Adapter: Video Conversation is Feasible Witho... | 2023-09-27 | Code |