TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Reasoning/Video Question Answering/STAR Benchmark

Video Question Answering on STAR Benchmark

Metric: Average Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Average Accuracy▼Extra DataPaperDate↕Code
1VLAP (4 frames)67.1NoViLA: Efficient Video-Language Alignment for Vid...2023-12-13Code
2LLaMA-VQA65.4NoLarge Language Models are Temporal and Causal Re...2023-10-24Code
3SeViLA64.9NoSelf-Chained Image-Language Model for Video Loca...2023-05-11Code
4InternVideo58.7NoInternVideo: General Video Foundation Models via...2022-12-06Code
5GF(sup)53.94NoGlance and Focus: Memory Prompting for Multi-Eve...2024-01-03Code
6GF(uns)53.86NoGlance and Focus: Memory Prompting for Multi-Eve...2024-01-03Code
7MIST51.13NoMIST: Multi-modal Iterative Spatial-Temporal Tra...2022-12-19Code
8Temp[ATP]48.37NoRevisiting the "Video" in Video-Language Underst...2022-06-03Code
9AnyMAL-70B (0-shot)48.2NoAnyMAL: An Efficient and Scalable Any-Modality A...2023-09-27Code
10All-in-one47.5NoAll in One: Exploring Unified Video-Language Pre...2022-03-14Code
11TraveLER (0-shot)44.9NoTraveLER: A Modular Multi-LMM Agent Framework fo...2024-04-01Code
12SeViLA (0-shot)44.6NoSelf-Chained Image-Language Model for Video Loca...2023-05-11Code
13Flamingo-9B (4-shot)42.8NoFlamingo: a Visual Language Model for Few-Shot L...2022-04-29Code
14Flamingo-80B (4-shot)42.4NoFlamingo: a Visual Language Model for Few-Shot L...2022-04-29Code
15Flamingo-9B (0-shot)41.8NoFlamingo: a Visual Language Model for Few-Shot L...2022-04-29Code
16Flamingo-80B (0-shot)39.7NoFlamingo: a Visual Language Model for Few-Shot L...2022-04-29Code
17SHG-VQA (trained from scratch)39.47NoLearning Situation Hyper-Graphs for Video Questi...2023-04-18Code