TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Reasoning/Video Question Answering/TVQA

Video Question Answering on TVQA

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1LLaMA-VQA82.2NoLarge Language Models are Temporal and Causal Re...2023-10-24Code
2FrozenBiLM82YesZero-Shot Video Question Answering via Frozen Bi...2022-06-16Code
3VindLU79YesVindLU: A Recipe for Effective Video-and-Languag...2022-12-09Code
4iPerceive (Chadha et al., 2020)76.96NoiPerceive: Applying Common-Sense Reasoning to Mu...2020-11-16-
5Hero w/ pre-training74.24NoHERO: Hierarchical Encoder for Video+Language Om...2020-05-01Code
6STAGE (Lei et al., 2019)70.5NoTVQA+: Spatio-Temporal Grounding for Video Quest...2019-04-25Code
7FrozenBiLM (with speech)59.7NoZero-Shot Video Question Answering via Frozen Bi...2022-06-16Code
8IG-VLM (no speech, GPT-4V)57.8NoAn Image Grid Can Be Worth a Video: Zero-shot Vi...2024-03-27Code
9MiniGPT4-video-7B54.21NoMiniGPT4-Video: Advancing Multimodal LLMs for Vi...2024-04-04Code
10VideoChat_HD_mistral (no speech)50.6NoMVBench: A Comprehensive Multi-modal Video Under...2023-11-28Code
11VideoChat_mistral (no speech)46.4NoMVBench: A Comprehensive Multi-modal Video Under...2023-11-28Code
12VideoChat2 (no speech)40.6NoMVBench: A Comprehensive Multi-modal Video Under...2023-11-28Code
13SEVILA (no speech)38.2NoSelf-Chained Image-Language Model for Video Loca...2023-05-11Code
14InternVideo (no speech)35.9NoInternVideo: General Video Foundation Models via...2022-12-06Code
15FrozenBILM (no speech)29.7NoZero-Shot Video Question Answering via Frozen Bi...2022-06-16Code