TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Reasoning/Video Question Answering/TVBench

Video Question Answering on TVBench

Metric: Average Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Average Accuracy▼Extra DataPaperDate↕Code
1Seed1.5-VL thinking63.6NoSeed1.5-VL Technical Report2025-05-11-
2PLM-8B63.5NoPerceptionLM: Open-Access Data and Models for De...2025-04-17Code
3Seed1.5-VL61.5NoSeed1.5-VL Technical Report2025-05-11-
4V-JEPA 2 ViT-g 8B60.6NoV-JEPA 2: Self-Supervised Video Models Enable Un...2025-06-11Code
5PLM-3B58.9NoPerceptionLM: Open-Access Data and Models for De...2025-04-17Code
6RRPO56.5NoSelf-alignment of Large Video Language Models wi...2025-04-16-
7Tarsier-34B55.5NoTarsier: Recipes for Training and Evaluating Lar...2024-06-30Code
8Tarsier2-7B54.7NoTarsier2: Advancing Large Vision-Language Models...2025-01-14Code
9Qwen2-VL-72B52.7NoQwen2-VL: Enhancing Vision-Language Model's Perc...2024-09-18Code
10IXC-2.5 7B51.6NoInternLM-XComposer-2.5: A Versatile Large Vision...2024-07-03Code
11Aria51NoAria: An Open Multimodal Native Mixture-of-Exper...2024-10-08Code
12PLM-1B50.4NoPerceptionLM: Open-Access Data and Models for De...2025-04-17Code
13LLaVA-Video 72B50NoVideo Instruction Tuning With Synthetic Data2024-10-03-
14VideoLLaMA2 72B48.4NoVideoLLaMA 2: Advancing Spatial-Temporal Modelin...2024-06-11Code
15Gemini 1.5 Pro47.6NoGemini 1.5: Unlocking multimodal understanding a...2024-03-08Code
16Tarsier-7B46.9NoTarsier: Recipes for Training and Evaluating Lar...2024-06-30Code
17LLaVA-Video 7B45.6NoVideo Instruction Tuning With Synthetic Data2024-10-03-
18Qwen2-VL-7B43.8NoQwen2-VL: Enhancing Vision-Language Model's Perc...2024-09-18Code
19VideoLLaMA2 7B42.9NoVideoLLaMA 2: Advancing Spatial-Temporal Modelin...2024-06-11Code
20PLLaVA-34B42.3NoPLLaVA : Parameter-free LLaVA Extension from Ima...2024-04-25Code
21mPLUG-Owl342.2NomPLUG-Owl3: Towards Long Image-Sequence Understa...2024-08-09Code
22VideoLLaMA2.142.1NoVideoLLaMA 2: Advancing Spatial-Temporal Modelin...2024-06-11Code
23VideoGPT+41.7NoVideoGPT+: Integrating Image and Video Encoders ...2024-06-13Code
24GPT4o 8 frames39.9NoGPT-4o System Card2024-10-25-
25PLLaVA-13B36.4NoPLLaVA : Parameter-free LLaVA Extension from Ima...2024-04-25Code
26ST-LLM35.7NoST-LLM: Large Language Models Are Effective Temp...2024-03-30Code
27VideoChat235NoMVBench: A Comprehensive Multi-modal Video Under...2023-11-28Code
28PLLaVA-7B34.9NoPLLaVA : Parameter-free LLaVA Extension from Ima...2024-04-25Code