Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Reasoning
/
Video Question Answering
/
MVBench
Video Question Answering on MVBench
Metric: Avg. (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
#
Model
↕
Avg.
▼
Extra Data
Paper
Date
↕
Code
1
LinVT-Qwen2-VL (7B)
69.3
No
LinVT: Empower Your Image-level Large Language M...
2024-12-06
Code
2
Tarsier (34B)
67.6
No
Tarsier: Recipes for Training and Evaluating Lar...
2024-06-30
Code
3
InternVideo2
67.2
No
InternVideo2: Scaling Foundation Models for Mult...
2024-03-22
Code
4
LongVU (7B)
66.9
No
LongVU: Spatiotemporal Adaptive Compression for ...
2024-10-22
Code
5
Oryx(34B)
64.7
No
Oryx MLLM: On-Demand Spatial-Temporal Understand...
2024-09-19
Code
6
VideoLLaMA2 (72B)
62
No
VideoLLaMA 2: Advancing Spatial-Temporal Modelin...
2024-06-11
Code
7
VideoChat-T (7B)
59.9
No
TimeSuite: Improving MLLMs for Long Video Unders...
2024-10-25
Code
8
mPLUG-Owl3(7B)
59.5
No
mPLUG-Owl3: Towards Long Image-Sequence Understa...
2024-08-09
Code
9
PPLLaVA (7b)
59.2
No
PPLLaVA: Varied Video Sequence Understanding Wit...
2024-11-04
Code
10
VideoGPT+
58.7
No
VideoGPT+: Integrating Image and Video Encoders ...
2024-06-13
Code
11
PLLaVA
58.1
No
PLLaVA : Parameter-free LLaVA Extension from Ima...
2024-04-25
Code
12
ST-LLM
54.9
No
ST-LLM: Large Language Models Are Effective Temp...
2024-03-30
Code
13
VideoChat2
51.9
No
MVBench: A Comprehensive Multi-modal Video Under...
2023-11-28
Code
14
HawkEye
47.55
No
HawkEye: Training Video-Text LLMs for Grounding ...
2024-03-15
Code
15
SPHINX-Plus
39.7
No
SPHINX-X: Scaling Data and Parameters for a Fami...
2024-02-08
Code
16
TimeChat
38.5
No
TimeChat: A Time-sensitive Multimodal Large Lang...
2023-12-04
Code
17
LLaVa
36
No
Visual Instruction Tuning
2023-04-17
Code
18
VideoChat
35.5
No
VideoChat: Chat-Centric Video Understanding
2023-05-10
Code
19
VideoLLaMA
34.1
No
Video-LLaMA: An Instruction-tuned Audio-Visual L...
2023-06-05
Code
20
Video-ChatGPT
32.7
No
Video-ChatGPT: Towards Detailed Video Understand...
2023-06-08
Code
21
InstructBLIP
32.5
No
InstructBLIP: Towards General-purpose Vision-Lan...
2023-05-11
Code
22
MiniGPT4
18.8
No
MiniGPT-4: Enhancing Vision-Language Understandi...
2023-04-20
Code