Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Reasoning
/
Video Question Answering
/
EgoSchema (subset)
Video Question Answering on EgoSchema (subset)
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
Tarsier (34B)
68.6
No
Tarsier: Recipes for Training and Evaluating Lar...
2024-06-30
Code
2
VideoChat-T (7B)
68.4
No
TimeSuite: Improving MLLMs for Long Video Unders...
2024-10-25
Code
3
LangRepo (12B)
66.2
No
Language Repository for Long Video Understanding
2024-03-21
Code
4
VideoTree (GPT4)
66.2
No
VideoTree: Adaptive Tree-based Video Representat...
2024-05-29
Code
5
LVNet
66
No
Too Many Frames, Not All Useful: Efficient Strat...
2024-06-13
Code
6
VideoChat2_HD_mistral
65.6
No
MVBench: A Comprehensive Multi-modal Video Under...
2023-11-28
Code
7
VideoChat2_mistral
63.6
No
MVBench: A Comprehensive Multi-modal Video Under...
2023-11-28
Code
8
MVU (13B)
60.3
No
Understanding Long Videos with Multimodal Langua...
2024-03-25
Code
9
TS-LLaVA-34B
57.8
No
TS-LLaVA: Constructing Visual Tokens through Thu...
2024-11-17
Code
10
LLoVi (GPT-3.5)
57.6
No
A Simple LLM Framework for Long-Range Video Ques...
2023-12-28
Code
11
LLoVi (7B)
50.8
No
A Simple LLM Framework for Long-Range Video Ques...
2023-12-28
Code
12
SlowFast-LLaVA-34B
47.2
No
SlowFast-LLaVA: A Strong Training-Free Baseline ...
2024-07-22
Code
13
SeViLA (4B)
25.7
No
Self-Chained Image-Language Model for Video Loca...
2023-05-11
Code
14
Random
20
No
CREPE: Can Vision-Language Foundation Models Rea...
2022-12-13
Code