Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Reasoning
/
Video Question Answering
/
STAR Benchmark
Video Question Answering on STAR Benchmark
Metric: Average Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
#
Model
↕
Average Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
VLAP (4 frames)
67.1
No
ViLA: Efficient Video-Language Alignment for Vid...
2023-12-13
Code
2
LLaMA-VQA
65.4
No
Large Language Models are Temporal and Causal Re...
2023-10-24
Code
3
SeViLA
64.9
No
Self-Chained Image-Language Model for Video Loca...
2023-05-11
Code
4
InternVideo
58.7
No
InternVideo: General Video Foundation Models via...
2022-12-06
Code
5
GF(sup)
53.94
No
Glance and Focus: Memory Prompting for Multi-Eve...
2024-01-03
Code
6
GF(uns)
53.86
No
Glance and Focus: Memory Prompting for Multi-Eve...
2024-01-03
Code
7
MIST
51.13
No
MIST: Multi-modal Iterative Spatial-Temporal Tra...
2022-12-19
Code
8
Temp[ATP]
48.37
No
Revisiting the "Video" in Video-Language Underst...
2022-06-03
Code
9
AnyMAL-70B (0-shot)
48.2
No
AnyMAL: An Efficient and Scalable Any-Modality A...
2023-09-27
Code
10
All-in-one
47.5
No
All in One: Exploring Unified Video-Language Pre...
2022-03-14
Code
11
TraveLER (0-shot)
44.9
No
TraveLER: A Modular Multi-LMM Agent Framework fo...
2024-04-01
Code
12
SeViLA (0-shot)
44.6
No
Self-Chained Image-Language Model for Video Loca...
2023-05-11
Code
13
Flamingo-9B (4-shot)
42.8
No
Flamingo: a Visual Language Model for Few-Shot L...
2022-04-29
Code
14
Flamingo-80B (4-shot)
42.4
No
Flamingo: a Visual Language Model for Few-Shot L...
2022-04-29
Code
15
Flamingo-9B (0-shot)
41.8
No
Flamingo: a Visual Language Model for Few-Shot L...
2022-04-29
Code
16
Flamingo-80B (0-shot)
39.7
No
Flamingo: a Visual Language Model for Few-Shot L...
2022-04-29
Code
17
SHG-VQA (trained from scratch)
39.47
No
Learning Situation Hyper-Graphs for Video Questi...
2023-04-18
Code