TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Question Answering/Video-MME

Question Answering on Video-MME

Metric: Accuracy (%) (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy (%)▼Extra DataPaperDate↕Code
1Gemini 1.5 Pro81.3NoGemini 1.5: Unlocking multimodal understanding a...2024-03-08Code
2Video-RAG (Based on LLaVA-Video)77.4NoVideo-RAG: Visually-aligned Retrieval-Augmented ...2024-11-20Code
3GPT-4o77.2NoGPT-4o: Visual perception performance of multimo...2024-06-14-
4Gemini 1.5 Flash75NoGemini 1.5: Unlocking multimodal understanding a...2024-03-08Code
5GPT-4o mini68.9NoGPT-4o: Visual perception performance of multimo...2024-06-14-
6BIMBA-LLaVA-Qwen2-7B64.67NoBIMBA: Selective-Scan Compression for Long-Range...2025-03-12Code
7VILA-1.5 (34B)64.1NoVILA: On Pre-training for Visual Language Models2023-12-12Code
8MiniCPM-V 2.6 (8B)63.7NoMiniCPM-V: A GPT-4V Level MLLM on Your Phone2024-08-03Code
9VideoLLaMA2 (72B)63.1NoVideoLLaMA 2: Advancing Spatial-Temporal Modelin...2024-06-11Code
10LongVU (7B)60.6NoLongVU: Spatiotemporal Adaptive Compression for ...2024-10-22Code
11VideoChat-T (7B)55.8NoTimeSuite: Improving MLLMs for Long Video Unders...2024-10-25Code