| 1 | PPLLaVA-7B | 4.21 | No | PPLLaVA: Varied Video Sequence Understanding Wit... | 2024-11-04 | Code |
| 2 | PLLaVA-34B | 3.9 | No | PLLaVA : Parameter-free LLaVA Extension from Ima... | 2024-04-25 | Code |
| 3 | TS-LLaVA-34B | 3.86 | No | TS-LLaVA: Constructing Visual Tokens through Thu... | 2024-11-17 | Code |
| 4 | PPLLaVA-7B | 3.85 | No | PPLLaVA: Varied Video Sequence Understanding Wit... | 2024-11-04 | Code |
| 5 | SlowFast-LLaVA-34B | 3.84 | No | SlowFast-LLaVA: A Strong Training-Free Baseline ... | 2024-07-22 | Code |
| 6 | PPLLaVA-7B | 3.81 | No | PPLLaVA: Varied Video Sequence Understanding Wit... | 2024-11-04 | Code |
| 7 | ST-LLM | 3.74 | No | ST-LLM: Large Language Models Are Effective Temp... | 2024-03-30 | Code |
| 8 | VideoGPT+ | 3.74 | No | VideoGPT+: Integrating Image and Video Encoders ... | 2024-06-13 | Code |
| 9 | TS-LLaVA-34B | 3.69 | No | TS-LLaVA: Constructing Visual Tokens through Thu... | 2024-11-17 | Code |
| 10 | VideoChat2_HD_mistral | 3.64 | No | MVBench: A Comprehensive Multi-modal Video Under... | 2023-11-28 | Code |
| 11 | PLLaVA-34B | 3.6 | No | PLLaVA : Parameter-free LLaVA Extension from Ima... | 2024-04-25 | Code |
| 12 | MiniGPT4-video-7B | 3.57 | No | MiniGPT4-Video: Advancing Multimodal LLMs for Vi... | 2024-04-04 | Code |
| 13 | SlowFast-LLaVA-34B | 3.57 | No | SlowFast-LLaVA: A Strong Training-Free Baseline ... | 2024-07-22 | Code |
| 14 | PPLLaVA-7B | 3.56 | No | PPLLaVA: Varied Video Sequence Understanding Wit... | 2024-11-04 | Code |
| 15 | TS-LLaVA-34B | 3.55 | No | TS-LLaVA: Constructing Visual Tokens through Thu... | 2024-11-17 | Code |
| 16 | VideoChat2 | 3.51 | No | MVBench: A Comprehensive Multi-modal Video Under... | 2023-11-28 | Code |
| 17 | SlowFast-LLaVA-34B | 3.48 | No | SlowFast-LLaVA: A Strong Training-Free Baseline ... | 2024-07-22 | Code |
| 18 | Chat-UniVi | 3.46 | No | Chat-UniVi: Unified Visual Representation Empowe... | 2023-11-14 | Code |
| 19 | VTimeLLM | 3.4 | No | VTimeLLM: Empower LLM to Grasp Video Moments | 2023-11-30 | Code |
| 20 | VideoChat2_HD_mistral | 3.4 | No | MVBench: A Comprehensive Multi-modal Video Under... | 2023-11-28 | Code |
| 21 | VideoGPT+ | 3.39 | No | VideoGPT+: Integrating Image and Video Encoders ... | 2024-06-13 | Code |
| 22 | BT-Adapter | 3.27 | No | BT-Adapter: Video Conversation is Feasible Witho... | 2023-09-27 | Code |
| 23 | VideoGPT+ | 3.27 | No | VideoGPT+: Integrating Image and Video Encoders ... | 2024-06-13 | Code |
| 24 | PLLaVA-34B | 3.25 | No | PLLaVA : Parameter-free LLaVA Extension from Ima... | 2024-04-25 | Code |
| 25 | ST-LLM | 3.23 | No | ST-LLM: Large Language Models Are Effective Temp... | 2024-03-30 | Code |
| 26 | PPLLaVA-7B | 3.21 | No | PPLLaVA: Varied Video Sequence Understanding Wit... | 2024-11-04 | Code |
| 27 | PLLaVA-34B | 3.2 | No | PLLaVA : Parameter-free LLaVA Extension from Ima... | 2024-04-25 | Code |
| 28 | VideoGPT+ | 3.18 | No | VideoGPT+: Integrating Image and Video Encoders ... | 2024-06-13 | Code |
| 29 | VTimeLLM | 3.1 | No | VTimeLLM: Empower LLM to Grasp Video Moments | 2023-11-30 | Code |
| 30 | MiniGPT4-video-7B | 3.08 | No | MiniGPT4-Video: Advancing Multimodal LLMs for Vi... | 2024-04-04 | Code |
| 31 | ST-LLM | 3.05 | No | ST-LLM: Large Language Models Are Effective Temp... | 2024-03-30 | Code |
| 32 | TS-LLaVA-34B | 3.03 | No | TS-LLaVA: Constructing Visual Tokens through Thu... | 2024-11-17 | Code |
| 33 | VideoChat2 | 3.02 | No | MVBench: A Comprehensive Multi-modal Video Under... | 2023-11-28 | Code |
| 34 | MiniGPT4-video-7B | 3.02 | No | MiniGPT4-Video: Advancing Multimodal LLMs for Vi... | 2024-04-04 | Code |
| 35 | MovieChat | 3.01 | No | MovieChat: From Dense Token to Sparse Memory for... | 2023-07-31 | Code |
| 36 | SlowFast-LLaVA-34B | 2.96 | No | SlowFast-LLaVA: A Strong Training-Free Baseline ... | 2024-07-22 | Code |
| 37 | MovieChat | 2.93 | No | MovieChat: From Dense Token to Sparse Memory for... | 2023-07-31 | Code |
| 38 | ST-LLM | 2.93 | No | ST-LLM: Large Language Models Are Effective Temp... | 2024-03-30 | Code |
| 39 | Chat-UniVi | 2.91 | No | Chat-UniVi: Unified Visual Representation Empowe... | 2023-11-14 | Code |
| 40 | BT-Adapter (zero-shot) | 2.89 | No | BT-Adapter: Video Conversation is Feasible Witho... | 2023-09-27 | Code |
| 41 | Chat-UniVi | 2.89 | No | Chat-UniVi: Unified Visual Representation Empowe... | 2023-11-14 | Code |
| 42 | VideoChat2 | 2.88 | No | MVBench: A Comprehensive Multi-modal Video Under... | 2023-11-28 | Code |
| 43 | VideoChat2_HD_mistral | 2.86 | No | MVBench: A Comprehensive Multi-modal Video Under... | 2023-11-28 | Code |
| 44 | VideoGPT+ | 2.83 | No | VideoGPT+: Integrating Image and Video Encoders ... | 2024-06-13 | Code |
| 45 | Chat-UniVi | 2.81 | No | Chat-UniVi: Unified Visual Representation Empowe... | 2023-11-14 | Code |
| 46 | VideoChat2 | 2.81 | No | MVBench: A Comprehensive Multi-modal Video Under... | 2023-11-28 | Code |
| 47 | ST-LLM | 2.81 | No | ST-LLM: Large Language Models Are Effective Temp... | 2024-03-30 | Code |
| 48 | VTimeLLM | 2.78 | No | VTimeLLM: Empower LLM to Grasp Video Moments | 2023-11-30 | Code |
| 49 | SlowFast-LLaVA-34B | 2.77 | No | SlowFast-LLaVA: A Strong Training-Free Baseline ... | 2024-07-22 | Code |
| 50 | TS-LLaVA-34B | 2.77 | No | TS-LLaVA: Constructing Visual Tokens through Thu... | 2024-11-17 | Code |
| 51 | MovieChat | 2.76 | No | MovieChat: From Dense Token to Sparse Memory for... | 2023-07-31 | Code |
| 52 | BT-Adapter | 2.69 | No | BT-Adapter: Video Conversation is Feasible Witho... | 2023-09-27 | Code |
| 53 | BT-Adapter | 2.68 | No | BT-Adapter: Video Conversation is Feasible Witho... | 2023-09-27 | Code |
| 54 | PLLaVA-34B | 2.67 | No | PLLaVA : Parameter-free LLaVA Extension from Ima... | 2024-04-25 | Code |
| 55 | MiniGPT4-video-7B | 2.67 | No | MiniGPT4-Video: Advancing Multimodal LLMs for Vi... | 2024-04-04 | Code |
| 56 | VideoChat2 | 2.66 | No | MVBench: A Comprehensive Multi-modal Video Under... | 2023-11-28 | Code |
| 57 | MiniGPT4-video-7B | 2.65 | No | MiniGPT4-Video: Advancing Multimodal LLMs for Vi... | 2024-04-04 | Code |
| 58 | VideoChat2_HD_mistral | 2.65 | No | MVBench: A Comprehensive Multi-modal Video Under... | 2023-11-28 | Code |
| 59 | Video-ChatGPT | 2.62 | No | Video-ChatGPT: Towards Detailed Video Understand... | 2023-06-08 | Code |
| 60 | VideoChat2_HD_mistral | 2.62 | No | MVBench: A Comprehensive Multi-modal Video Under... | 2023-11-28 | Code |
| 61 | Video Chat | 2.53 | No | VideoChat: Chat-Centric Video Understanding | 2023-05-10 | Code |
| 62 | Video-ChatGPT | 2.52 | No | Video-ChatGPT: Towards Detailed Video Understand... | 2023-06-08 | Code |
| 63 | Video Chat | 2.5 | No | VideoChat: Chat-Centric Video Understanding | 2023-05-10 | Code |
| 64 | VTimeLLM | 2.49 | No | VTimeLLM: Empower LLM to Grasp Video Moments | 2023-11-30 | Code |
| 65 | VTimeLLM | 2.47 | No | VTimeLLM: Empower LLM to Grasp Video Moments | 2023-11-30 | Code |
| 66 | BT-Adapter (zero-shot) | 2.46 | No | BT-Adapter: Video Conversation is Feasible Witho... | 2023-09-27 | Code |
| 67 | BT-Adapter | 2.46 | No | BT-Adapter: Video Conversation is Feasible Witho... | 2023-09-27 | Code |
| 68 | MovieChat | 2.42 | No | MovieChat: From Dense Token to Sparse Memory for... | 2023-07-31 | Code |
| 69 | Video-ChatGPT | 2.4 | No | Video-ChatGPT: Towards Detailed Video Understand... | 2023-06-08 | Code |
| 70 | Chat-UniVi | 2.39 | No | Chat-UniVi: Unified Visual Representation Empowe... | 2023-11-14 | Code |
| 71 | Video-ChatGPT | 2.37 | No | Video-ChatGPT: Towards Detailed Video Understand... | 2023-06-08 | Code |
| 72 | BT-Adapter | 2.34 | No | BT-Adapter: Video Conversation is Feasible Witho... | 2023-09-27 | Code |
| 73 | Video Chat | 2.32 | No | VideoChat: Chat-Centric Video Understanding | 2023-05-10 | Code |
| 74 | LLaMA Adapter | 2.32 | No | LLaMA-Adapter V2: Parameter-Efficient Visual Ins... | 2023-04-28 | Code |
| 75 | LLaMA Adapter | 2.3 | No | LLaMA-Adapter V2: Parameter-Efficient Visual Ins... | 2023-04-28 | Code |
| 76 | MovieChat | 2.24 | No | MovieChat: From Dense Token to Sparse Memory for... | 2023-07-31 | Code |
| 77 | Video Chat | 2.24 | No | VideoChat: Chat-Centric Video Understanding | 2023-05-10 | Code |
| 78 | BT-Adapter (zero-shot) | 2.2 | No | BT-Adapter: Video Conversation is Feasible Witho... | 2023-09-27 | Code |
| 79 | Video LLaMA | 2.18 | No | Video-LLaMA: An Instruction-tuned Audio-Visual L... | 2023-06-05 | Code |
| 80 | Video LLaMA | 2.16 | No | Video-LLaMA: An Instruction-tuned Audio-Visual L... | 2023-06-05 | Code |
| 81 | BT-Adapter (zero-shot) | 2.16 | No | BT-Adapter: Video Conversation is Feasible Witho... | 2023-09-27 | Code |
| 82 | LLaMA Adapter | 2.15 | No | LLaMA-Adapter V2: Parameter-Efficient Visual Ins... | 2023-04-28 | Code |
| 83 | BT-Adapter (zero-shot) | 2.13 | No | BT-Adapter: Video Conversation is Feasible Witho... | 2023-09-27 | Code |
| 84 | LLaMA Adapter | 2.03 | No | LLaMA-Adapter V2: Parameter-Efficient Visual Ins... | 2023-04-28 | Code |
| 85 | Video-ChatGPT | 1.98 | No | Video-ChatGPT: Towards Detailed Video Understand... | 2023-06-08 | Code |
| 86 | LLaMA Adapter | 1.98 | No | LLaMA-Adapter V2: Parameter-Efficient Visual Ins... | 2023-04-28 | Code |
| 87 | Video LLaMA | 1.96 | No | Video-LLaMA: An Instruction-tuned Audio-Visual L... | 2023-06-05 | Code |
| 88 | Video Chat | 1.94 | No | VideoChat: Chat-Centric Video Understanding | 2023-05-10 | Code |
| 89 | Video LLaMA | 1.82 | No | Video-LLaMA: An Instruction-tuned Audio-Visual L... | 2023-06-05 | Code |
| 90 | Video LLaMA | 1.79 | No | Video-LLaMA: An Instruction-tuned Audio-Visual L... | 2023-06-05 | Code |