TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Reasoning/Video Question Answering/OVBench

Video Question Answering on OVBench

Metric: AVG (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕AVG▼Extra DataPaperDate↕Code
1Seed1.5-VL60NoSeed1.5-VL Technical Report2025-05-11-
2VideoChat-Online (4B)54.9NoOnline Video Understanding: OVBench and VideoCha...2024-12-31Code
3Gemini-1.5-Flash50.7NoGemini 1.5: Unlocking multimodal understanding a...2024-03-08Code
4Qwen2-VL (7B)49.7NoQwen2-VL: Enhancing Vision-Language Model's Perc...2024-09-18Code
5LLaVA-OneVision (7B)49.5NoLLaVA-OneVision: Easy Visual Task Transfer2024-08-06Code
6InternVL2 (7B)48.7NoExpanding Performance Boundaries of Open-Source ...2024-12-06Code
7InternVL2 (4B)44.1NoExpanding Performance Boundaries of Open-Source ...2024-12-06Code
8LongVA (7B)43.6NoLong Context Transfer from Language to Vision2024-06-24Code
9LLaMA-VID (7B)41.9NoLLaMA-VID: An Image is Worth 2 Tokens in Large L...2023-11-28Code
10MiniCPM-V 2.6 (7B)39.1No---
11VTimeLLM (7B)33.1NoVTimeLLM: Empower LLM to Grasp Video Moments2023-11-30Code
12Flash-Vstream (7B)31.2NoFlash-VStream: Memory-Based Real-Time Understand...2024-06-12Code
13MovieChat (7B)30.9NoMovieChat: From Dense Token to Sparse Memory for...2023-07-31Code
14LITA (7B)20.4NoLITA: Language Instructed Temporal-Localization ...2024-03-27Code
15TimeChat (7B)12.8NoTimeChat: A Time-sensitive Multimodal Large Lang...2023-12-04Code
16VideoLLM-Online (7B)9.6NoVideoLLM-online: Online Video Large Language Mod...2024-06-17-