Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Reasoning
/
Video Question Answering
/
TGIF-QA
Video Question Answering on TGIF-QA
Metric: Confidence Score (lower is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Confidence Score (best first)
Confidence Score (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Confidence Score
▲
Extra Data
Paper
Date
↕
Code
1
Video Chat-7B
2.3
No
VideoChat: Chat-Centric Video Understanding
2023-05-10
Code
2
Video-ChatGPT-7B
3
No
Video-ChatGPT: Towards Detailed Video Understand...
2023-06-08
Code
3
Elysium
3.6
No
Elysium: Exploring Object-level Perception in Vi...
2024-03-25
Code
4
Chat-UniVi-7B
3.8
No
Chat-UniVi: Unified Visual Representation Empowe...
2023-11-14
Code
5
Video-LLaVA-7B
4
No
Video-LLaVA: Learning United Visual Representati...
2023-11-16
Code
6
VideoGPT+
4.1
No
VideoGPT+: Integrating Image and Video Encoders ...
2024-06-13
Code
7
TS-LLaVA-34B
4.2
No
TS-LLaVA: Constructing Visual Tokens through Thu...
2024-11-17
Code
8
IG-VLM
4.2
No
An Image Grid Can Be Worth a Video: Zero-shot Vi...
2024-03-27
Code
9
LinVT-Qwen2-VL (7B)
4.3
No
LinVT: Empower Your Image-level Large Language M...
2024-12-06
Code
10
PLLaVA
4.3
No
PLLaVA : Parameter-free LLaVA Extension from Ima...
2024-04-25
Code
11
SlowFast-LLaVA-34B
4.3
No
SlowFast-LLaVA: A Strong Training-Free Baseline ...
2024-07-22
Code
12
Tarsier (34B)
4.4
No
Tarsier: Recipes for Training and Evaluating Lar...
2024-06-30
Code
#1
Video Chat-7B
SOTA
2.3
Confidence Score
· 2023-05-10
VideoChat: Chat-Centric Video Understanding
Code
#2
Video-ChatGPT-7B
3
Confidence Score
· 2023-06-08
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Code
#3
Elysium
3.6
Confidence Score
· 2024-03-25
Elysium: Exploring Object-level Perception in Videos via MLLM
Code
#4
Chat-UniVi-7B
3.8
Confidence Score
· 2023-11-14
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Code
#5
Video-LLaVA-7B
4
Confidence Score
· 2023-11-16
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Code
#6
VideoGPT+
4.1
Confidence Score
· 2024-06-13
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Code
#7
TS-LLaVA-34B
4.2
Confidence Score
· 2024-11-17
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
Code
#8
IG-VLM
4.2
Confidence Score
· 2024-03-27
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
Code
#9
LinVT-Qwen2-VL (7B)
4.3
Confidence Score
· 2024-12-06
LinVT: Empower Your Image-level Large Language Model to Understand Videos
Code
#10
PLLaVA
4.3
Confidence Score
· 2024-04-25
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Code
#11
SlowFast-LLaVA-34B
4.3
Confidence Score
· 2024-07-22
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Code
#12
Tarsier (34B)
4.4
Confidence Score
· 2024-06-30
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Code