Temporal Relation Extraction on Vinoground

Metric: Group Score (higher is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	Group Score▼	Extra Data	Paper	Date↕	Code
1	GPT-4o (CoT)	35	No	-	-	-
2	GPT-4o	24.6	No	-	-	-
3	LLaVA-OneVision-Qwen2-72B	21.8	No	LLaVA-OneVision: Easy Visual Task Transfer	2024-08-06	Code
4	Qwen2-VL-72B	17.4	No	Qwen2-VL: Enhancing Vision-Language Model's Perc...	2024-09-18	Code
5	Qwen2-VL-7B	15.2	No	Qwen2-VL: Enhancing Vision-Language Model's Perc...	2024-09-18	Code
6	LLaVA-OneVision-Qwen2-7B	14.6	No	LLaVA-OneVision: Easy Visual Task Transfer	2024-08-06	Code
7	Gemini-1.5-Pro (CoT)	12.4	No	Gemini 1.5: Unlocking multimodal understanding a...	2024-03-08	Code
8	MiniCPM-2.6	11.2	No	MiniCPM-V: A GPT-4V Level MLLM on Your Phone	2024-08-03	Code
9	Claude 3.5 Sonnet	10.6	No	-	-	-
10	Gemini-1.5-Pro	10.2	No	Gemini 1.5: Unlocking multimodal understanding a...	2024-03-08	Code
11	InternLM-XC-2.5	9.6	No	InternLM-XComposer-2.5: A Versatile Large Vision...	2024-07-03	Code
12	InternLM-XC-2.5 (CoT)	9	No	InternLM-XComposer-2.5: A Versatile Large Vision...	2024-07-03	Code
13	VideoLLaMA2-72B	8.4	No	VideoLLaMA 2: Advancing Spatial-Temporal Modelin...	2024-06-11	Code
14	MA-LMM-Vicuna-7B	6.8	No	MA-LMM: Memory-Augmented Large Multimodal Model ...	2024-04-08	Code
15	LLaVA-NeXT-Video-7B (CoT)	6.8	No	-	-	-
16	Video-LLaVA-7B	6.6	No	Video-LLaVA: Learning United Visual Representati...	2023-11-16	Code
17	Phi-3.5-Vision	6.2	No	-	-	-
18	LLaVA-NeXT-Video-7B	6.2	No	-	-	-
19	LLaVA-NeXT-Video-34B (CoT)	5.2	No	-	-	-
20	VTimeLLM	5.2	No	VTimeLLM: Empower LLM to Grasp Video Moments	2023-11-30	Code
21	LLaVA-NeXT-Video-34B	3.8	No	-	-	-
22	VideoCLIP	1.2	No	VideoCLIP: Contrastive Pre-training for Zero-sho...	2021-09-28	Code
23	LanguageBind	1.2	No	LanguageBind: Extending Video-Language Pretraini...	2023-10-03	Code
24	ImageBind	0.6	No	ImageBind: One Embedding Space To Bind Them All	2023-05-09	Code

#1GPT-4o (CoT)
35
Group Score
No paper
#2GPT-4o
24.6
Group Score
No paper
#3LLaVA-OneVision-Qwen2-72BSOTA
21.8
Group Score· 2024-08-06
LLaVA-OneVision: Easy Visual Task Transfer Code
#4Qwen2-VL-72B
17.4
Group Score· 2024-09-18
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Code
#5Qwen2-VL-7B
15.2
Group Score· 2024-09-18
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Code
#6LLaVA-OneVision-Qwen2-7B
14.6
Group Score· 2024-08-06
LLaVA-OneVision: Easy Visual Task Transfer Code
#7Gemini-1.5-Pro (CoT)SOTA
12.4
Group Score· 2024-03-08
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Code
#8MiniCPM-2.6
11.2
Group Score· 2024-08-03
MiniCPM-V: A GPT-4V Level MLLM on Your Phone Code
#9Claude 3.5 Sonnet
10.6
Group Score
No paper
#10Gemini-1.5-Pro
10.2
Group Score· 2024-03-08
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Code
#11InternLM-XC-2.5
9.6
Group Score· 2024-07-03
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Code
#12InternLM-XC-2.5 (CoT)
9
Group Score· 2024-07-03
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Code
#13VideoLLaMA2-72B
8.4
Group Score· 2024-06-11
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Code
#14MA-LMM-Vicuna-7B
6.8
Group Score· 2024-04-08
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding Code
#15LLaVA-NeXT-Video-7B (CoT)
6.8
Group Score
No paper
#16Video-LLaVA-7BSOTA
6.6
Group Score· 2023-11-16
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection Code
#17Phi-3.5-Vision
6.2
Group Score
No paper
#18LLaVA-NeXT-Video-7B
6.2
Group Score
No paper
#19LLaVA-NeXT-Video-34B (CoT)
5.2
Group Score
No paper
#20VTimeLLM
5.2
Group Score· 2023-11-30
VTimeLLM: Empower LLM to Grasp Video Moments Code
#21LLaVA-NeXT-Video-34B
3.8
Group Score
No paper
#22VideoCLIPSOTA
1.2
Group Score· 2021-09-28
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding Code
#23LanguageBind
1.2
Group Score· 2023-10-03
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment Code
#24ImageBind
0.6
Group Score· 2023-05-09
ImageBind: One Embedding Space To Bind Them All Code