Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Relation Extraction
/
Vinoground
Relation Extraction on Vinoground
Metric: Video Score (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Video Score (best first)
Video Score (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Video Score
▼
Extra Data
Paper
Date
↕
Code
1
GPT-4o (CoT)
51
No
-
-
-
2
GPT-4o
38.2
No
-
-
-
3
LLaVA-OneVision-Qwen2-72B
35.2
No
LLaVA-OneVision: Easy Visual Task Transfer
2024-08-06
Code
4
Qwen2-VL-72B
32.6
No
Qwen2-VL: Enhancing Vision-Language Model's Perc...
2024-09-18
Code
5
Qwen2-VL-7B
32.4
No
Qwen2-VL: Enhancing Vision-Language Model's Perc...
2024-09-18
Code
6
LLaVA-OneVision-Qwen2-7B
29.4
No
LLaVA-OneVision: Easy Visual Task Transfer
2024-08-06
Code
7
MiniCPM-2.6
29.2
No
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
2024-08-03
Code
8
Claude 3.5 Sonnet
28.8
No
-
-
-
9
InternLM-XC-2.5 (CoT)
28.4
No
InternLM-XComposer-2.5: A Versatile Large Vision...
2024-07-03
Code
10
InternLM-XC-2.5
27.8
No
InternLM-XComposer-2.5: A Versatile Large Vision...
2024-07-03
Code
11
Gemini-1.5-Pro (CoT)
27.6
No
Gemini 1.5: Unlocking multimodal understanding a...
2024-03-08
Code
12
VTimeLLM
27
No
VTimeLLM: Empower LLM to Grasp Video Moments
2023-11-30
Code
13
LLaVA-NeXT-Video-7B (CoT)
26.2
No
-
-
-
14
Video-LLaVA-7B
25.8
No
Video-LLaVA: Learning United Visual Representati...
2023-11-16
Code
15
MA-LMM-Vicuna-7B
25.6
No
MA-LMM: Memory-Augmented Large Multimodal Model ...
2024-04-08
Code
16
LLaVA-NeXT-Video-7B
25.6
No
-
-
-
17
Gemini-1.5-Pro
22.6
No
Gemini 1.5: Unlocking multimodal understanding a...
2024-03-08
Code
18
Phi-3.5-Vision
22.4
No
-
-
-
19
LLaVA-NeXT-Video-34B (CoT)
22.2
No
-
-
-
20
VideoLLaMA2-72B
21.8
No
VideoLLaMA 2: Advancing Spatial-Temporal Modelin...
2024-06-11
Code
21
LLaVA-NeXT-Video-34B
21.2
No
-
-
-
22
LanguageBind
5
No
LanguageBind: Extending Video-Language Pretraini...
2023-10-03
Code
23
ImageBind
3.4
No
ImageBind: One Embedding Space To Bind Them All
2023-05-09
Code
24
VideoCLIP
2.8
No
VideoCLIP: Contrastive Pre-training for Zero-sho...
2021-09-28
Code
#1
GPT-4o (CoT)
51
Video Score
No paper
#2
GPT-4o
38.2
Video Score
No paper
#3
LLaVA-OneVision-Qwen2-72B
SOTA
35.2
Video Score
· 2024-08-06
LLaVA-OneVision: Easy Visual Task Transfer
Code
#4
Qwen2-VL-72B
32.6
Video Score
· 2024-09-18
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Code
#5
Qwen2-VL-7B
32.4
Video Score
· 2024-09-18
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Code
#6
LLaVA-OneVision-Qwen2-7B
29.4
Video Score
· 2024-08-06
LLaVA-OneVision: Easy Visual Task Transfer
Code
#7
MiniCPM-2.6
SOTA
29.2
Video Score
· 2024-08-03
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Code
#8
Claude 3.5 Sonnet
28.8
Video Score
No paper
#9
InternLM-XC-2.5 (CoT)
SOTA
28.4
Video Score
· 2024-07-03
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Code
#10
InternLM-XC-2.5
27.8
Video Score
· 2024-07-03
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Code
#11
Gemini-1.5-Pro (CoT)
SOTA
27.6
Video Score
· 2024-03-08
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Code
#12
VTimeLLM
SOTA
27
Video Score
· 2023-11-30
VTimeLLM: Empower LLM to Grasp Video Moments
Code
#13
LLaVA-NeXT-Video-7B (CoT)
26.2
Video Score
No paper
#14
Video-LLaVA-7B
SOTA
25.8
Video Score
· 2023-11-16
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Code
#15
MA-LMM-Vicuna-7B
25.6
Video Score
· 2024-04-08
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Code
#16
LLaVA-NeXT-Video-7B
25.6
Video Score
No paper
#17
Gemini-1.5-Pro
22.6
Video Score
· 2024-03-08
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Code
#18
Phi-3.5-Vision
22.4
Video Score
No paper
#19
LLaVA-NeXT-Video-34B (CoT)
22.2
Video Score
No paper
#20
VideoLLaMA2-72B
21.8
Video Score
· 2024-06-11
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Code
#21
LLaVA-NeXT-Video-34B
21.2
Video Score
No paper
#22
LanguageBind
SOTA
5
Video Score
· 2023-10-03
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Code
#23
ImageBind
SOTA
3.4
Video Score
· 2023-05-09
ImageBind: One Embedding Space To Bind Them All
Code
#24
VideoCLIP
SOTA
2.8
Video Score
· 2021-09-28
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Code