Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Reasoning
/
Video Question Answering
/
TGIF-QA
Video Question Answering on TGIF-QA
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Accuracy (best first)
Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
Tarsier (34B)
82.5
No
Tarsier: Recipes for Training and Evaluating Lar...
2024-06-30
Code
2
LinVT-Qwen2-VL (7B)
81.3
No
LinVT: Empower Your Image-level Large Language M...
2024-12-06
Code
3
TS-LLaVA-34B
81
No
TS-LLaVA: Constructing Visual Tokens through Thu...
2024-11-17
Code
4
PLLaVA
80.6
No
PLLaVA : Parameter-free LLaVA Extension from Ima...
2024-04-25
Code
5
SlowFast-LLaVA-34B
80.6
No
SlowFast-LLaVA: A Strong Training-Free Baseline ...
2024-07-22
Code
6
IG-VLM
79.1
No
An Image Grid Can Be Worth a Video: Zero-shot Vi...
2024-03-27
Code
7
VideoGPT+
74.6
No
VideoGPT+: Integrating Image and Video Encoders ...
2024-06-13
Code
8
MiniGPT4-video-7B
72.22
No
MiniGPT4-Video: Advancing Multimodal LLMs for Vi...
2024-04-04
Code
9
Video-LLaVA-7B
70
No
Video-LLaVA: Learning United Visual Representati...
2023-11-16
Code
10
Chat-UniVi-7B
69
No
Chat-UniVi: Unified Visual Representation Empowe...
2023-11-14
Code
11
Elysium
66.6
No
Elysium: Exploring Object-level Perception in Vi...
2024-03-25
Code
12
LocVLM-Vid-B
51.8
No
Learning to Localize Objects Improves Spatial Re...
2024-04-11
Code
13
Video-ChatGPT-7B
51.4
No
Video-ChatGPT: Towards Detailed Video Understand...
2023-06-08
Code
14
FrozenBiLM
41.9
No
Zero-Shot Video Question Answering via Frozen Bi...
2022-06-16
Code
15
Video Chat-7B
34.4
No
VideoChat: Chat-Centric Video Understanding
2023-05-10
Code
#1
Tarsier (34B)
SOTA
82.5
Accuracy
· 2024-06-30
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Code
#2
LinVT-Qwen2-VL (7B)
81.3
Accuracy
· 2024-12-06
LinVT: Empower Your Image-level Large Language Model to Understand Videos
Code
#3
TS-LLaVA-34B
81
Accuracy
· 2024-11-17
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
Code
#4
PLLaVA
SOTA
80.6
Accuracy
· 2024-04-25
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Code
#5
SlowFast-LLaVA-34B
80.6
Accuracy
· 2024-07-22
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Code
#6
IG-VLM
SOTA
79.1
Accuracy
· 2024-03-27
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
Code
#7
VideoGPT+
74.6
Accuracy
· 2024-06-13
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Code
#8
MiniGPT4-video-7B
72.22
Accuracy
· 2024-04-04
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
Code
#9
Video-LLaVA-7B
SOTA
70
Accuracy
· 2023-11-16
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Code
#10
Chat-UniVi-7B
SOTA
69
Accuracy
· 2023-11-14
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Code
#11
Elysium
66.6
Accuracy
· 2024-03-25
Elysium: Exploring Object-level Perception in Videos via MLLM
Code
#12
LocVLM-Vid-B
51.8
Accuracy
· 2024-04-11
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Code
#13
Video-ChatGPT-7B
SOTA
51.4
Accuracy
· 2023-06-08
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Code
#14
FrozenBiLM
SOTA
41.9
Accuracy
· 2022-06-16
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Code
#15
Video Chat-7B
34.4
Accuracy
· 2023-05-10
VideoChat: Chat-Centric Video Understanding
Code