VCGBench-Diverse on VideoInstruct

Metric: Spatial Understanding (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	Spatial Understanding▼	Extra Data	Paper	Date↕	Code
1	VideoGPT+	2.8	No	VideoGPT+: Integrating Image and Video Encoders ...	2024-06-13	Code
2	VideoChat2	2.43	No	MVBench: A Comprehensive Multi-modal Video Under...	2023-11-28	Code
3	Chat-UniVi	2.36	No	Chat-UniVi: Unified Visual Representation Empowe...	2023-11-14	Code
4	BT-Adapter	2.35	No	BT-Adapter: Video Conversation is Feasible Witho...	2023-09-27	Code
5	VTimeLLM	2.29	No	VTimeLLM: Empower LLM to Grasp Video Moments	2023-11-30	Code
6	Video-ChatGPT	2.25	No	Video-ChatGPT: Towards Detailed Video Understand...	2023-06-08	Code

#1VideoGPT+SOTA
2.8
Spatial Understanding· 2024-06-13
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding Code
#2VideoChat2SOTA
2.43
Spatial Understanding· 2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark Code
#3Chat-UniViSOTA
2.36
Spatial Understanding· 2023-11-14
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding Code
#4BT-AdapterSOTA
2.35
Spatial Understanding· 2023-09-27
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning Code
#5VTimeLLM
2.29
Spatial Understanding· 2023-11-30
VTimeLLM: Empower LLM to Grasp Video Moments Code
#6Video-ChatGPTSOTA
2.25
Spatial Understanding· 2023-06-08
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models Code