Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Reasoning
/
Video Question Answering
/
TVQA
Video Question Answering on TVQA
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
LLaMA-VQA
82.2
No
Large Language Models are Temporal and Causal Re...
2023-10-24
Code
2
FrozenBiLM
82
Yes
Zero-Shot Video Question Answering via Frozen Bi...
2022-06-16
Code
3
VindLU
79
Yes
VindLU: A Recipe for Effective Video-and-Languag...
2022-12-09
Code
4
iPerceive (Chadha et al., 2020)
76.96
No
iPerceive: Applying Common-Sense Reasoning to Mu...
2020-11-16
-
5
Hero w/ pre-training
74.24
No
HERO: Hierarchical Encoder for Video+Language Om...
2020-05-01
Code
6
STAGE (Lei et al., 2019)
70.5
No
TVQA+: Spatio-Temporal Grounding for Video Quest...
2019-04-25
Code
7
FrozenBiLM (with speech)
59.7
No
Zero-Shot Video Question Answering via Frozen Bi...
2022-06-16
Code
8
IG-VLM (no speech, GPT-4V)
57.8
No
An Image Grid Can Be Worth a Video: Zero-shot Vi...
2024-03-27
Code
9
MiniGPT4-video-7B
54.21
No
MiniGPT4-Video: Advancing Multimodal LLMs for Vi...
2024-04-04
Code
10
VideoChat_HD_mistral (no speech)
50.6
No
MVBench: A Comprehensive Multi-modal Video Under...
2023-11-28
Code
11
VideoChat_mistral (no speech)
46.4
No
MVBench: A Comprehensive Multi-modal Video Under...
2023-11-28
Code
12
VideoChat2 (no speech)
40.6
No
MVBench: A Comprehensive Multi-modal Video Under...
2023-11-28
Code
13
SEVILA (no speech)
38.2
No
Self-Chained Image-Language Model for Video Loca...
2023-05-11
Code
14
InternVideo (no speech)
35.9
No
InternVideo: General Video Foundation Models via...
2022-12-06
Code
15
FrozenBILM (no speech)
29.7
No
Zero-Shot Video Question Answering via Frozen Bi...
2022-06-16
Code