Video Question Answering on OVBench

Metric: AVG (higher is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	AVG▼	Extra Data	Paper	Date↕	Code
1	Seed1.5-VL	60	No	Seed1.5-VL Technical Report	2025-05-11	-
2	VideoChat-Online (4B)	54.9	No	Online Video Understanding: OVBench and VideoCha...	2024-12-31	Code
3	Gemini-1.5-Flash	50.7	No	Gemini 1.5: Unlocking multimodal understanding a...	2024-03-08	Code
4	Qwen2-VL (7B)	49.7	No	Qwen2-VL: Enhancing Vision-Language Model's Perc...	2024-09-18	Code
5	LLaVA-OneVision (7B)	49.5	No	LLaVA-OneVision: Easy Visual Task Transfer	2024-08-06	Code
6	InternVL2 (7B)	48.7	No	Expanding Performance Boundaries of Open-Source ...	2024-12-06	Code
7	InternVL2 (4B)	44.1	No	Expanding Performance Boundaries of Open-Source ...	2024-12-06	Code
8	LongVA (7B)	43.6	No	Long Context Transfer from Language to Vision	2024-06-24	Code
9	LLaMA-VID (7B)	41.9	No	LLaMA-VID: An Image is Worth 2 Tokens in Large L...	2023-11-28	Code
10	MiniCPM-V 2.6 (7B)	39.1	No	-	-	-
11	VTimeLLM (7B)	33.1	No	VTimeLLM: Empower LLM to Grasp Video Moments	2023-11-30	Code
12	Flash-Vstream (7B)	31.2	No	Flash-VStream: Memory-Based Real-Time Understand...	2024-06-12	Code
13	MovieChat (7B)	30.9	No	MovieChat: From Dense Token to Sparse Memory for...	2023-07-31	Code
14	LITA (7B)	20.4	No	LITA: Language Instructed Temporal-Localization ...	2024-03-27	Code
15	TimeChat (7B)	12.8	No	TimeChat: A Time-sensitive Multimodal Large Lang...	2023-12-04	Code
16	VideoLLM-Online (7B)	9.6	No	VideoLLM-online: Online Video Large Language Mod...	2024-06-17	-

#1Seed1.5-VLSOTA
60
AVG· 2025-05-11
Seed1.5-VL Technical Report
#2VideoChat-Online (4B)SOTA
54.9
AVG· 2024-12-31
Online Video Understanding: OVBench and VideoChat-Online Code
#3Gemini-1.5-FlashSOTA
50.7
AVG· 2024-03-08
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Code
#4Qwen2-VL (7B)
49.7
AVG· 2024-09-18
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Code
#5LLaVA-OneVision (7B)
49.5
AVG· 2024-08-06
LLaVA-OneVision: Easy Visual Task Transfer Code
#6InternVL2 (7B)
48.7
AVG· 2024-12-06
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Code
#7InternVL2 (4B)
44.1
AVG· 2024-12-06
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Code
#8LongVA (7B)
43.6
AVG· 2024-06-24
Long Context Transfer from Language to Vision Code
#9LLaMA-VID (7B)SOTA
41.9
AVG· 2023-11-28
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models Code
#10MiniCPM-V 2.6 (7B)
39.1
AVG
No paper
#11VTimeLLM (7B)
33.1
AVG· 2023-11-30
VTimeLLM: Empower LLM to Grasp Video Moments Code
#12Flash-Vstream (7B)
31.2
AVG· 2024-06-12
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams Code
#13MovieChat (7B)SOTA
30.9
AVG· 2023-07-31
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding Code
#14LITA (7B)
20.4
AVG· 2024-03-27
LITA: Language Instructed Temporal-Localization Assistant Code
#15TimeChat (7B)
12.8
AVG· 2023-12-04
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding Code
#16VideoLLM-Online (7B)
9.6
AVG· 2024-06-17
VideoLLM-online: Online Video Large Language Model for Streaming Video