Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Video-based Generative Performance Benchmarking (Correctness of Information)
/
VideoInstruct
Video-based Generative Performance Benchmarking (Correctness of Information) on VideoInstruct
Metric: gpt-score (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
gpt-score (best first)
gpt-score (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
gpt-score
▼
Extra Data
Paper
Date
↕
Code
1
PPLLaVA-7B
3.85
No
PPLLaVA: Varied Video Sequence Understanding Wit...
2024-11-04
Code
2
PLLaVA-34B
3.6
No
PLLaVA : Parameter-free LLaVA Extension from Ima...
2024-04-25
Code
3
TS-LLaVA-34B
3.55
No
TS-LLaVA: Constructing Visual Tokens through Thu...
2024-11-17
Code
4
SlowFast-LLaVA-34B
3.48
No
SlowFast-LLaVA: A Strong Training-Free Baseline ...
2024-07-22
Code
5
VideoChat2_HD_mistral
3.4
No
MVBench: A Comprehensive Multi-modal Video Under...
2023-11-28
Code
6
VideoGPT+
3.27
No
VideoGPT+: Integrating Image and Video Encoders ...
2024-06-13
Code
7
ST-LLM
3.23
No
ST-LLM: Large Language Models Are Effective Temp...
2024-03-30
Code
8
MiniGPT4-video-7B
3.08
No
MiniGPT4-Video: Advancing Multimodal LLMs for Vi...
2024-04-04
Code
9
VideoChat2
3.02
No
MVBench: A Comprehensive Multi-modal Video Under...
2023-11-28
Code
10
Chat-UniVi
2.89
No
Chat-UniVi: Unified Visual Representation Empowe...
2023-11-14
Code
11
VTimeLLM
2.78
No
VTimeLLM: Empower LLM to Grasp Video Moments
2023-11-30
Code
12
MovieChat
2.76
No
MovieChat: From Dense Token to Sparse Memory for...
2023-07-31
Code
13
BT-Adapter
2.68
No
BT-Adapter: Video Conversation is Feasible Witho...
2023-09-27
Code
14
Video-ChatGPT
2.4
No
Video-ChatGPT: Towards Detailed Video Understand...
2023-06-08
Code
15
Video Chat
2.32
No
VideoChat: Chat-Centric Video Understanding
2023-05-10
Code
16
BT-Adapter (zero-shot)
2.16
No
BT-Adapter: Video Conversation is Feasible Witho...
2023-09-27
Code
17
LLaMA Adapter
2.03
No
LLaMA-Adapter V2: Parameter-Efficient Visual Ins...
2023-04-28
Code
18
Video LLaMA
1.96
No
Video-LLaMA: An Instruction-tuned Audio-Visual L...
2023-06-05
Code
#1
PPLLaVA-7B
SOTA
3.85
gpt-score
· 2024-11-04
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Code
#2
PLLaVA-34B
SOTA
3.6
gpt-score
· 2024-04-25
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Code
#3
TS-LLaVA-34B
3.55
gpt-score
· 2024-11-17
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
Code
#4
SlowFast-LLaVA-34B
3.48
gpt-score
· 2024-07-22
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Code
#5
VideoChat2_HD_mistral
SOTA
3.4
gpt-score
· 2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Code
#6
VideoGPT+
3.27
gpt-score
· 2024-06-13
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Code
#7
ST-LLM
3.23
gpt-score
· 2024-03-30
ST-LLM: Large Language Models Are Effective Temporal Learners
Code
#8
MiniGPT4-video-7B
3.08
gpt-score
· 2024-04-04
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
Code
#9
VideoChat2
3.02
gpt-score
· 2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Code
#10
Chat-UniVi
SOTA
2.89
gpt-score
· 2023-11-14
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Code
#11
VTimeLLM
2.78
gpt-score
· 2023-11-30
VTimeLLM: Empower LLM to Grasp Video Moments
Code
#12
MovieChat
SOTA
2.76
gpt-score
· 2023-07-31
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Code
#13
BT-Adapter
2.68
gpt-score
· 2023-09-27
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Code
#14
Video-ChatGPT
SOTA
2.4
gpt-score
· 2023-06-08
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Code
#15
Video Chat
SOTA
2.32
gpt-score
· 2023-05-10
VideoChat: Chat-Centric Video Understanding
Code
#16
BT-Adapter (zero-shot)
2.16
gpt-score
· 2023-09-27
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Code
#17
LLaMA Adapter
SOTA
2.03
gpt-score
· 2023-04-28
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Code
#18
Video LLaMA
1.96
gpt-score
· 2023-06-05
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Code