Video Question Answering on TGIF-QA

Metric: Confidence Score (lower is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	Confidence Score▲	Extra Data	Paper	Date↕	Code
1	Video Chat-7B	2.3	No	VideoChat: Chat-Centric Video Understanding	2023-05-10	Code
2	Video-ChatGPT-7B	3	No	Video-ChatGPT: Towards Detailed Video Understand...	2023-06-08	Code
3	Elysium	3.6	No	Elysium: Exploring Object-level Perception in Vi...	2024-03-25	Code
4	Chat-UniVi-7B	3.8	No	Chat-UniVi: Unified Visual Representation Empowe...	2023-11-14	Code
5	Video-LLaVA-7B	4	No	Video-LLaVA: Learning United Visual Representati...	2023-11-16	Code
6	VideoGPT+	4.1	No	VideoGPT+: Integrating Image and Video Encoders ...	2024-06-13	Code
7	TS-LLaVA-34B	4.2	No	TS-LLaVA: Constructing Visual Tokens through Thu...	2024-11-17	Code
8	IG-VLM	4.2	No	An Image Grid Can Be Worth a Video: Zero-shot Vi...	2024-03-27	Code
9	LinVT-Qwen2-VL (7B)	4.3	No	LinVT: Empower Your Image-level Large Language M...	2024-12-06	Code
10	PLLaVA	4.3	No	PLLaVA : Parameter-free LLaVA Extension from Ima...	2024-04-25	Code
11	SlowFast-LLaVA-34B	4.3	No	SlowFast-LLaVA: A Strong Training-Free Baseline ...	2024-07-22	Code
12	Tarsier (34B)	4.4	No	Tarsier: Recipes for Training and Evaluating Lar...	2024-06-30	Code

#1Video Chat-7BSOTA
2.3
Confidence Score· 2023-05-10
VideoChat: Chat-Centric Video Understanding Code
#2Video-ChatGPT-7B
3
Confidence Score· 2023-06-08
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models Code
#3Elysium
3.6
Confidence Score· 2024-03-25
Elysium: Exploring Object-level Perception in Videos via MLLM Code
#4Chat-UniVi-7B
3.8
Confidence Score· 2023-11-14
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding Code
#5Video-LLaVA-7B
4
Confidence Score· 2023-11-16
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection Code
#6VideoGPT+
4.1
Confidence Score· 2024-06-13
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding Code
#7TS-LLaVA-34B
4.2
Confidence Score· 2024-11-17
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models Code
#8IG-VLM
4.2
Confidence Score· 2024-03-27
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM Code
#9LinVT-Qwen2-VL (7B)
4.3
Confidence Score· 2024-12-06
LinVT: Empower Your Image-level Large Language Model to Understand Videos Code
#10PLLaVA
4.3
Confidence Score· 2024-04-25
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Code
#11SlowFast-LLaVA-34B
4.3
Confidence Score· 2024-07-22
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models Code
#12Tarsier (34B)
4.4
Confidence Score· 2024-06-30
Tarsier: Recipes for Training and Evaluating Large Video Description Models Code