Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Visual Question Answering (VQA)
/
VideoInstruct
Visual Question Answering (VQA) on VideoInstruct
Metric: mean (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
#
Model
↕
mean
▼
Extra Data
Paper
Date
↕
Code
1
PPLLaVA-7B-dpo
3.73
No
PPLLaVA: Varied Video Sequence Understanding Wit...
2024-11-04
Code
2
VLM-RLAIF
3.49
No
Tuning Large Multimodal Models for Videos using ...
2024-02-06
Code
3
TS-LLaVA-34B
3.38
No
TS-LLaVA: Constructing Visual Tokens through Thu...
2024-11-17
Code
4
PLLaVA-34B
3.32
No
PLLaVA : Parameter-free LLaVA Extension from Ima...
2024-04-25
Code
5
PPLLaVA-7B
3.32
No
PPLLaVA: Varied Video Sequence Understanding Wit...
2024-11-04
Code
6
SlowFast-LLaVA-34B
3.32
No
SlowFast-LLaVA: A Strong Training-Free Baseline ...
2024-07-22
Code
7
VideoGPT+
3.28
No
VideoGPT+: Integrating Image and Video Encoders ...
2024-06-13
Code
8
IG-VLM-GPT4v
3.17
No
An Image Grid Can Be Worth a Video: Zero-shot Vi...
2024-03-27
Code
9
ST-LLM-7B
3.15
No
ST-LLM: Large Language Models Are Effective Temp...
2024-03-30
Code
10
VideoChat2_HD_mistral
3.1
No
MVBench: A Comprehensive Multi-modal Video Under...
2023-11-28
Code
11
CAT-7B
3.07
No
CAT: Enhancing Multimodal Large Language Model t...
2024-03-07
Code
12
LITA-13B
3.04
No
LITA: Language Instructed Temporal-Localization ...
2024-03-27
Code
13
LLaMA-VID-13B (2 Token)
2.99
No
LLaMA-VID: An Image is Worth 2 Tokens in Large L...
2023-11-28
Code
14
Chat-UniVi
2.99
No
Chat-UniVi: Unified Visual Representation Empowe...
2023-11-14
Code
15
VideoChat2
2.98
No
MVBench: A Comprehensive Multi-modal Video Under...
2023-11-28
Code
16
LLaMA-VID-7B (2 Token)
2.89
No
LLaMA-VID: An Image is Worth 2 Tokens in Large L...
2023-11-28
Code
17
VTimeLLM
2.85
No
VTimeLLM: Empower LLM to Grasp Video Moments
2023-11-30
Code
18
BT-Adapter
2.69
No
BT-Adapter: Video Conversation is Feasible Witho...
2023-09-27
Code
19
BT-Adapter (zero-shot)
2.46
No
BT-Adapter: Video Conversation is Feasible Witho...
2023-09-27
Code
20
Video-ChatGPT
2.38
No
Video-ChatGPT: Towards Detailed Video Understand...
2023-06-08
Code
21
Video Chat
2.29
No
VideoChat: Chat-Centric Video Understanding
2023-05-10
Code
22
LLaMA Adapter
2.16
No
LLaMA-Adapter V2: Parameter-Efficient Visual Ins...
2023-04-28
Code
23
Video LLaMA
1.98
No
Video-LLaMA: An Instruction-tuned Audio-Visual L...
2023-06-05
Code