TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Question Answering/Video-MME (w/o subs)

Question Answering on Video-MME (w/o subs)

Metric: Accuracy (%) (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy (%)▼Extra DataPaperDate↕Code
1Video-RAG (based on LLaVA-Video)77.4NoVideo-RAG: Visually-aligned Retrieval-Augmented ...2024-11-20Code
2Gemini 1.5 Pro71.9NoGemini 1.5: Unlocking multimodal understanding a...2024-03-08Code
3GPT-4o70.3NoGPT-4o: Visual perception performance of multimo...2024-06-14-
4Gemini 1.5 Flash66.3NoGemini 1.5: Unlocking multimodal understanding a...2024-03-08Code
5LLaVA-OneVision (72B)64.8No---
6GPT-4o mini62.3NoGPT-4o: Visual perception performance of multimo...2024-06-14-
7VILA-1.5 (34B)61.4NoVILA: On Pre-training for Visual Language Models2023-12-12Code
8VideoLLaMA2 (72B)60.9NoVideoLLaMA 2: Advancing Spatial-Temporal Modelin...2024-06-11Code
9VideoChat-T (7B)46.3NoTimeSuite: Improving MLLMs for Long Video Unders...2024-10-25Code