Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Reasoning
/
Video Question Answering
/
TVBench
Video Question Answering on TVBench
Metric: Average Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
#
Model
↕
Average Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
Seed1.5-VL thinking
63.6
No
Seed1.5-VL Technical Report
2025-05-11
-
2
PLM-8B
63.5
No
PerceptionLM: Open-Access Data and Models for De...
2025-04-17
Code
3
Seed1.5-VL
61.5
No
Seed1.5-VL Technical Report
2025-05-11
-
4
V-JEPA 2 ViT-g 8B
60.6
No
V-JEPA 2: Self-Supervised Video Models Enable Un...
2025-06-11
Code
5
PLM-3B
58.9
No
PerceptionLM: Open-Access Data and Models for De...
2025-04-17
Code
6
RRPO
56.5
No
Self-alignment of Large Video Language Models wi...
2025-04-16
-
7
Tarsier-34B
55.5
No
Tarsier: Recipes for Training and Evaluating Lar...
2024-06-30
Code
8
Tarsier2-7B
54.7
No
Tarsier2: Advancing Large Vision-Language Models...
2025-01-14
Code
9
Qwen2-VL-72B
52.7
No
Qwen2-VL: Enhancing Vision-Language Model's Perc...
2024-09-18
Code
10
IXC-2.5 7B
51.6
No
InternLM-XComposer-2.5: A Versatile Large Vision...
2024-07-03
Code
11
Aria
51
No
Aria: An Open Multimodal Native Mixture-of-Exper...
2024-10-08
Code
12
PLM-1B
50.4
No
PerceptionLM: Open-Access Data and Models for De...
2025-04-17
Code
13
LLaVA-Video 72B
50
No
Video Instruction Tuning With Synthetic Data
2024-10-03
-
14
VideoLLaMA2 72B
48.4
No
VideoLLaMA 2: Advancing Spatial-Temporal Modelin...
2024-06-11
Code
15
Gemini 1.5 Pro
47.6
No
Gemini 1.5: Unlocking multimodal understanding a...
2024-03-08
Code
16
Tarsier-7B
46.9
No
Tarsier: Recipes for Training and Evaluating Lar...
2024-06-30
Code
17
LLaVA-Video 7B
45.6
No
Video Instruction Tuning With Synthetic Data
2024-10-03
-
18
Qwen2-VL-7B
43.8
No
Qwen2-VL: Enhancing Vision-Language Model's Perc...
2024-09-18
Code
19
VideoLLaMA2 7B
42.9
No
VideoLLaMA 2: Advancing Spatial-Temporal Modelin...
2024-06-11
Code
20
PLLaVA-34B
42.3
No
PLLaVA : Parameter-free LLaVA Extension from Ima...
2024-04-25
Code
21
mPLUG-Owl3
42.2
No
mPLUG-Owl3: Towards Long Image-Sequence Understa...
2024-08-09
Code
22
VideoLLaMA2.1
42.1
No
VideoLLaMA 2: Advancing Spatial-Temporal Modelin...
2024-06-11
Code
23
VideoGPT+
41.7
No
VideoGPT+: Integrating Image and Video Encoders ...
2024-06-13
Code
24
GPT4o 8 frames
39.9
No
GPT-4o System Card
2024-10-25
-
25
PLLaVA-13B
36.4
No
PLLaVA : Parameter-free LLaVA Extension from Ima...
2024-04-25
Code
26
ST-LLM
35.7
No
ST-LLM: Large Language Models Are Effective Temp...
2024-03-30
Code
27
VideoChat2
35
No
MVBench: A Comprehensive Multi-modal Video Under...
2023-11-28
Code
28
PLLaVA-7B
34.9
No
PLLaVA : Parameter-free LLaVA Extension from Ima...
2024-04-25
Code