TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Reasoning/Video Question Answering/EgoSchema (subset)

Video Question Answering on EgoSchema (subset)

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1Tarsier (34B)68.6NoTarsier: Recipes for Training and Evaluating Lar...2024-06-30Code
2VideoChat-T (7B)68.4NoTimeSuite: Improving MLLMs for Long Video Unders...2024-10-25Code
3LangRepo (12B)66.2NoLanguage Repository for Long Video Understanding2024-03-21Code
4VideoTree (GPT4)66.2NoVideoTree: Adaptive Tree-based Video Representat...2024-05-29Code
5LVNet66NoToo Many Frames, Not All Useful: Efficient Strat...2024-06-13Code
6VideoChat2_HD_mistral65.6NoMVBench: A Comprehensive Multi-modal Video Under...2023-11-28Code
7VideoChat2_mistral63.6NoMVBench: A Comprehensive Multi-modal Video Under...2023-11-28Code
8MVU (13B)60.3NoUnderstanding Long Videos with Multimodal Langua...2024-03-25Code
9TS-LLaVA-34B57.8NoTS-LLaVA: Constructing Visual Tokens through Thu...2024-11-17Code
10LLoVi (GPT-3.5)57.6NoA Simple LLM Framework for Long-Range Video Ques...2023-12-28Code
11LLoVi (7B)50.8NoA Simple LLM Framework for Long-Range Video Ques...2023-12-28Code
12SlowFast-LLaVA-34B47.2NoSlowFast-LLaVA: A Strong Training-Free Baseline ...2024-07-22Code
13SeViLA (4B)25.7NoSelf-Chained Image-Language Model for Video Loca...2023-05-11Code
14Random20NoCREPE: Can Vision-Language Foundation Models Rea...2022-12-13Code