Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Reasoning
/
Video Question Answering
/
IntentQA
Video Question Answering on IntentQA
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
ENTER
71.5
No
ENTER: Event Based Interpretable Reasoning for V...
2025-01-24
-
2
LVNet
71.1
No
Too Many Frames, Not All Useful: Efficient Strat...
2024-06-13
Code
3
TS-LLaVA-34B
67.9
No
TS-LLaVA: Constructing Visual Tokens through Thu...
2024-11-17
Code
4
VidCtx (7B)
67.1
No
VidCtx: Context-aware Video Question Answering w...
2024-12-23
Code
5
VideoTree (GPT4)
66.9
No
VideoTree: Adaptive Tree-based Video Representat...
2024-05-29
Code
6
IG-VLM
65.3
No
An Image Grid Can Be Worth a Video: Zero-shot Vi...
2024-03-27
Code
7
LLoVi (GPT-4)
64
No
A Simple LLM Framework for Long-Range Video Ques...
2023-12-28
Code
8
SeViLA (4B)
60.9
Yes
Self-Chained Image-Language Model for Video Loca...
2023-05-11
Code
9
SlowFast-LLaVA-34B
60.1
No
SlowFast-LLaVA: A Strong Training-Free Baseline ...
2024-07-22
Code
10
LangRepo (12B)
59.1
No
Language Repository for Long Video Understanding
2024-03-21
Code
11
LLoVi (7B)
53.6
No
A Simple LLM Framework for Long-Range Video Ques...
2023-12-28
Code
12
Mistral (7B)
50.4
No
Mistral 7B
2023-10-10
Code
13
Random
20
No
CREPE: Can Vision-Language Foundation Models Rea...
2022-12-13
Code