Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Temporal Relation Extraction
/
Vinoground
Temporal Relation Extraction on Vinoground
Metric: Group Score (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Group Score (best first)
Group Score (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Group Score
▼
Extra Data
Paper
Date
↕
Code
1
GPT-4o (CoT)
35
No
-
-
-
2
GPT-4o
24.6
No
-
-
-
3
LLaVA-OneVision-Qwen2-72B
21.8
No
LLaVA-OneVision: Easy Visual Task Transfer
2024-08-06
Code
4
Qwen2-VL-72B
17.4
No
Qwen2-VL: Enhancing Vision-Language Model's Perc...
2024-09-18
Code
5
Qwen2-VL-7B
15.2
No
Qwen2-VL: Enhancing Vision-Language Model's Perc...
2024-09-18
Code
6
LLaVA-OneVision-Qwen2-7B
14.6
No
LLaVA-OneVision: Easy Visual Task Transfer
2024-08-06
Code
7
Gemini-1.5-Pro (CoT)
12.4
No
Gemini 1.5: Unlocking multimodal understanding a...
2024-03-08
Code
8
MiniCPM-2.6
11.2
No
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
2024-08-03
Code
9
Claude 3.5 Sonnet
10.6
No
-
-
-
10
Gemini-1.5-Pro
10.2
No
Gemini 1.5: Unlocking multimodal understanding a...
2024-03-08
Code
11
InternLM-XC-2.5
9.6
No
InternLM-XComposer-2.5: A Versatile Large Vision...
2024-07-03
Code
12
InternLM-XC-2.5 (CoT)
9
No
InternLM-XComposer-2.5: A Versatile Large Vision...
2024-07-03
Code
13
VideoLLaMA2-72B
8.4
No
VideoLLaMA 2: Advancing Spatial-Temporal Modelin...
2024-06-11
Code
14
MA-LMM-Vicuna-7B
6.8
No
MA-LMM: Memory-Augmented Large Multimodal Model ...
2024-04-08
Code
15
LLaVA-NeXT-Video-7B (CoT)
6.8
No
-
-
-
16
Video-LLaVA-7B
6.6
No
Video-LLaVA: Learning United Visual Representati...
2023-11-16
Code
17
Phi-3.5-Vision
6.2
No
-
-
-
18
LLaVA-NeXT-Video-7B
6.2
No
-
-
-
19
LLaVA-NeXT-Video-34B (CoT)
5.2
No
-
-
-
20
VTimeLLM
5.2
No
VTimeLLM: Empower LLM to Grasp Video Moments
2023-11-30
Code
21
LLaVA-NeXT-Video-34B
3.8
No
-
-
-
22
VideoCLIP
1.2
No
VideoCLIP: Contrastive Pre-training for Zero-sho...
2021-09-28
Code
23
LanguageBind
1.2
No
LanguageBind: Extending Video-Language Pretraini...
2023-10-03
Code
24
ImageBind
0.6
No
ImageBind: One Embedding Space To Bind Them All
2023-05-09
Code
#1
GPT-4o (CoT)
35
Group Score
No paper
#2
GPT-4o
24.6
Group Score
No paper
#3
LLaVA-OneVision-Qwen2-72B
SOTA
21.8
Group Score
· 2024-08-06
LLaVA-OneVision: Easy Visual Task Transfer
Code
#4
Qwen2-VL-72B
17.4
Group Score
· 2024-09-18
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Code
#5
Qwen2-VL-7B
15.2
Group Score
· 2024-09-18
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Code
#6
LLaVA-OneVision-Qwen2-7B
14.6
Group Score
· 2024-08-06
LLaVA-OneVision: Easy Visual Task Transfer
Code
#7
Gemini-1.5-Pro (CoT)
SOTA
12.4
Group Score
· 2024-03-08
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Code
#8
MiniCPM-2.6
11.2
Group Score
· 2024-08-03
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Code
#9
Claude 3.5 Sonnet
10.6
Group Score
No paper
#10
Gemini-1.5-Pro
10.2
Group Score
· 2024-03-08
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Code
#11
InternLM-XC-2.5
9.6
Group Score
· 2024-07-03
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Code
#12
InternLM-XC-2.5 (CoT)
9
Group Score
· 2024-07-03
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Code
#13
VideoLLaMA2-72B
8.4
Group Score
· 2024-06-11
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Code
#14
MA-LMM-Vicuna-7B
6.8
Group Score
· 2024-04-08
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Code
#15
LLaVA-NeXT-Video-7B (CoT)
6.8
Group Score
No paper
#16
Video-LLaVA-7B
SOTA
6.6
Group Score
· 2023-11-16
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Code
#17
Phi-3.5-Vision
6.2
Group Score
No paper
#18
LLaVA-NeXT-Video-7B
6.2
Group Score
No paper
#19
LLaVA-NeXT-Video-34B (CoT)
5.2
Group Score
No paper
#20
VTimeLLM
5.2
Group Score
· 2023-11-30
VTimeLLM: Empower LLM to Grasp Video Moments
Code
#21
LLaVA-NeXT-Video-34B
3.8
Group Score
No paper
#22
VideoCLIP
SOTA
1.2
Group Score
· 2021-09-28
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Code
#23
LanguageBind
1.2
Group Score
· 2023-10-03
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Code
#24
ImageBind
0.6
Group Score
· 2023-05-09
ImageBind: One Embedding Space To Bind Them All
Code