Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Reasoning
/
Video Question Answering
/
OVBench
Video Question Answering on OVBench
Metric: AVG (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
AVG (best first)
AVG (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
AVG
▼
Extra Data
Paper
Date
↕
Code
1
Seed1.5-VL
60
No
Seed1.5-VL Technical Report
2025-05-11
-
2
VideoChat-Online (4B)
54.9
No
Online Video Understanding: OVBench and VideoCha...
2024-12-31
Code
3
Gemini-1.5-Flash
50.7
No
Gemini 1.5: Unlocking multimodal understanding a...
2024-03-08
Code
4
Qwen2-VL (7B)
49.7
No
Qwen2-VL: Enhancing Vision-Language Model's Perc...
2024-09-18
Code
5
LLaVA-OneVision (7B)
49.5
No
LLaVA-OneVision: Easy Visual Task Transfer
2024-08-06
Code
6
InternVL2 (7B)
48.7
No
Expanding Performance Boundaries of Open-Source ...
2024-12-06
Code
7
InternVL2 (4B)
44.1
No
Expanding Performance Boundaries of Open-Source ...
2024-12-06
Code
8
LongVA (7B)
43.6
No
Long Context Transfer from Language to Vision
2024-06-24
Code
9
LLaMA-VID (7B)
41.9
No
LLaMA-VID: An Image is Worth 2 Tokens in Large L...
2023-11-28
Code
10
MiniCPM-V 2.6 (7B)
39.1
No
-
-
-
11
VTimeLLM (7B)
33.1
No
VTimeLLM: Empower LLM to Grasp Video Moments
2023-11-30
Code
12
Flash-Vstream (7B)
31.2
No
Flash-VStream: Memory-Based Real-Time Understand...
2024-06-12
Code
13
MovieChat (7B)
30.9
No
MovieChat: From Dense Token to Sparse Memory for...
2023-07-31
Code
14
LITA (7B)
20.4
No
LITA: Language Instructed Temporal-Localization ...
2024-03-27
Code
15
TimeChat (7B)
12.8
No
TimeChat: A Time-sensitive Multimodal Large Lang...
2023-12-04
Code
16
VideoLLM-Online (7B)
9.6
No
VideoLLM-online: Online Video Large Language Mod...
2024-06-17
-
#1
Seed1.5-VL
SOTA
60
AVG
· 2025-05-11
Seed1.5-VL Technical Report
#2
VideoChat-Online (4B)
SOTA
54.9
AVG
· 2024-12-31
Online Video Understanding: OVBench and VideoChat-Online
Code
#3
Gemini-1.5-Flash
SOTA
50.7
AVG
· 2024-03-08
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Code
#4
Qwen2-VL (7B)
49.7
AVG
· 2024-09-18
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Code
#5
LLaVA-OneVision (7B)
49.5
AVG
· 2024-08-06
LLaVA-OneVision: Easy Visual Task Transfer
Code
#6
InternVL2 (7B)
48.7
AVG
· 2024-12-06
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Code
#7
InternVL2 (4B)
44.1
AVG
· 2024-12-06
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Code
#8
LongVA (7B)
43.6
AVG
· 2024-06-24
Long Context Transfer from Language to Vision
Code
#9
LLaMA-VID (7B)
SOTA
41.9
AVG
· 2023-11-28
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Code
#10
MiniCPM-V 2.6 (7B)
39.1
AVG
No paper
#11
VTimeLLM (7B)
33.1
AVG
· 2023-11-30
VTimeLLM: Empower LLM to Grasp Video Moments
Code
#12
Flash-Vstream (7B)
31.2
AVG
· 2024-06-12
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
Code
#13
MovieChat (7B)
SOTA
30.9
AVG
· 2023-07-31
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Code
#14
LITA (7B)
20.4
AVG
· 2024-03-27
LITA: Language Instructed Temporal-Localization Assistant
Code
#15
TimeChat (7B)
12.8
AVG
· 2023-12-04
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Code
#16
VideoLLM-Online (7B)
9.6
AVG
· 2024-06-17
VideoLLM-online: Online Video Large Language Model for Streaming Video