VCGBench-Diverse on VideoInstruct

Metric: Contextual Understanding (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	Contextual Understanding▼	Extra Data	Paper	Date↕	Code
1	VideoGPT+	2.81	No	VideoGPT+: Integrating Image and Video Encoders ...	2024-06-13	Code
2	Chat-UniVi	2.66	No	Chat-UniVi: Unified Visual Representation Empowe...	2023-11-14	Code
3	BT-Adapter	2.59	No	BT-Adapter: Video Conversation is Feasible Witho...	2023-09-27	Code
4	VideoChat2	2.51	No	MVBench: A Comprehensive Multi-modal Video Under...	2023-11-28	Code
5	VTimeLLM	2.48	No	VTimeLLM: Empower LLM to Grasp Video Moments	2023-11-30	Code
6	Video-ChatGPT	2.46	No	Video-ChatGPT: Towards Detailed Video Understand...	2023-06-08	Code

#1VideoGPT+SOTA
2.81
Contextual Understanding· 2024-06-13
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding Code
#2Chat-UniViSOTA
2.66
Contextual Understanding· 2023-11-14
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding Code
#3BT-AdapterSOTA
2.59
Contextual Understanding· 2023-09-27
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning Code
#4VideoChat2
2.51
Contextual Understanding· 2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark Code
#5VTimeLLM
2.48
Contextual Understanding· 2023-11-30
VTimeLLM: Empower LLM to Grasp Video Moments Code
#6Video-ChatGPTSOTA
2.46
Contextual Understanding· 2023-06-08
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models Code