Visual Question Answering (VQA) on VideoInstruct

Metric: gpt-score (higher is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	gpt-score▼	Extra Data	Paper	Date↕	Code
1	PPLLaVA-7B	4.21	No	PPLLaVA: Varied Video Sequence Understanding Wit...	2024-11-04	Code
2	PLLaVA-34B	3.9	No	PLLaVA : Parameter-free LLaVA Extension from Ima...	2024-04-25	Code
3	TS-LLaVA-34B	3.86	No	TS-LLaVA: Constructing Visual Tokens through Thu...	2024-11-17	Code
4	PPLLaVA-7B	3.85	No	PPLLaVA: Varied Video Sequence Understanding Wit...	2024-11-04	Code
5	SlowFast-LLaVA-34B	3.84	No	SlowFast-LLaVA: A Strong Training-Free Baseline ...	2024-07-22	Code
6	PPLLaVA-7B	3.81	No	PPLLaVA: Varied Video Sequence Understanding Wit...	2024-11-04	Code
7	ST-LLM	3.74	No	ST-LLM: Large Language Models Are Effective Temp...	2024-03-30	Code
8	VideoGPT+	3.74	No	VideoGPT+: Integrating Image and Video Encoders ...	2024-06-13	Code
9	TS-LLaVA-34B	3.69	No	TS-LLaVA: Constructing Visual Tokens through Thu...	2024-11-17	Code
10	VideoChat2_HD_mistral	3.64	No	MVBench: A Comprehensive Multi-modal Video Under...	2023-11-28	Code
11	PLLaVA-34B	3.6	No	PLLaVA : Parameter-free LLaVA Extension from Ima...	2024-04-25	Code
12	MiniGPT4-video-7B	3.57	No	MiniGPT4-Video: Advancing Multimodal LLMs for Vi...	2024-04-04	Code
13	SlowFast-LLaVA-34B	3.57	No	SlowFast-LLaVA: A Strong Training-Free Baseline ...	2024-07-22	Code
14	PPLLaVA-7B	3.56	No	PPLLaVA: Varied Video Sequence Understanding Wit...	2024-11-04	Code
15	TS-LLaVA-34B	3.55	No	TS-LLaVA: Constructing Visual Tokens through Thu...	2024-11-17	Code
16	VideoChat2	3.51	No	MVBench: A Comprehensive Multi-modal Video Under...	2023-11-28	Code
17	SlowFast-LLaVA-34B	3.48	No	SlowFast-LLaVA: A Strong Training-Free Baseline ...	2024-07-22	Code
18	Chat-UniVi	3.46	No	Chat-UniVi: Unified Visual Representation Empowe...	2023-11-14	Code
19	VTimeLLM	3.4	No	VTimeLLM: Empower LLM to Grasp Video Moments	2023-11-30	Code
20	VideoChat2_HD_mistral	3.4	No	MVBench: A Comprehensive Multi-modal Video Under...	2023-11-28	Code
21	VideoGPT+	3.39	No	VideoGPT+: Integrating Image and Video Encoders ...	2024-06-13	Code
22	BT-Adapter	3.27	No	BT-Adapter: Video Conversation is Feasible Witho...	2023-09-27	Code
23	VideoGPT+	3.27	No	VideoGPT+: Integrating Image and Video Encoders ...	2024-06-13	Code
24	PLLaVA-34B	3.25	No	PLLaVA : Parameter-free LLaVA Extension from Ima...	2024-04-25	Code
25	ST-LLM	3.23	No	ST-LLM: Large Language Models Are Effective Temp...	2024-03-30	Code
26	PPLLaVA-7B	3.21	No	PPLLaVA: Varied Video Sequence Understanding Wit...	2024-11-04	Code
27	PLLaVA-34B	3.2	No	PLLaVA : Parameter-free LLaVA Extension from Ima...	2024-04-25	Code
28	VideoGPT+	3.18	No	VideoGPT+: Integrating Image and Video Encoders ...	2024-06-13	Code
29	VTimeLLM	3.1	No	VTimeLLM: Empower LLM to Grasp Video Moments	2023-11-30	Code
30	MiniGPT4-video-7B	3.08	No	MiniGPT4-Video: Advancing Multimodal LLMs for Vi...	2024-04-04	Code
31	ST-LLM	3.05	No	ST-LLM: Large Language Models Are Effective Temp...	2024-03-30	Code
32	TS-LLaVA-34B	3.03	No	TS-LLaVA: Constructing Visual Tokens through Thu...	2024-11-17	Code
33	VideoChat2	3.02	No	MVBench: A Comprehensive Multi-modal Video Under...	2023-11-28	Code
34	MiniGPT4-video-7B	3.02	No	MiniGPT4-Video: Advancing Multimodal LLMs for Vi...	2024-04-04	Code
35	MovieChat	3.01	No	MovieChat: From Dense Token to Sparse Memory for...	2023-07-31	Code
36	SlowFast-LLaVA-34B	2.96	No	SlowFast-LLaVA: A Strong Training-Free Baseline ...	2024-07-22	Code
37	MovieChat	2.93	No	MovieChat: From Dense Token to Sparse Memory for...	2023-07-31	Code
38	ST-LLM	2.93	No	ST-LLM: Large Language Models Are Effective Temp...	2024-03-30	Code
39	Chat-UniVi	2.91	No	Chat-UniVi: Unified Visual Representation Empowe...	2023-11-14	Code
40	BT-Adapter (zero-shot)	2.89	No	BT-Adapter: Video Conversation is Feasible Witho...	2023-09-27	Code
41	Chat-UniVi	2.89	No	Chat-UniVi: Unified Visual Representation Empowe...	2023-11-14	Code
42	VideoChat2	2.88	No	MVBench: A Comprehensive Multi-modal Video Under...	2023-11-28	Code
43	VideoChat2_HD_mistral	2.86	No	MVBench: A Comprehensive Multi-modal Video Under...	2023-11-28	Code
44	VideoGPT+	2.83	No	VideoGPT+: Integrating Image and Video Encoders ...	2024-06-13	Code
45	Chat-UniVi	2.81	No	Chat-UniVi: Unified Visual Representation Empowe...	2023-11-14	Code
46	VideoChat2	2.81	No	MVBench: A Comprehensive Multi-modal Video Under...	2023-11-28	Code
47	ST-LLM	2.81	No	ST-LLM: Large Language Models Are Effective Temp...	2024-03-30	Code
48	VTimeLLM	2.78	No	VTimeLLM: Empower LLM to Grasp Video Moments	2023-11-30	Code
49	SlowFast-LLaVA-34B	2.77	No	SlowFast-LLaVA: A Strong Training-Free Baseline ...	2024-07-22	Code
50	TS-LLaVA-34B	2.77	No	TS-LLaVA: Constructing Visual Tokens through Thu...	2024-11-17	Code
51	MovieChat	2.76	No	MovieChat: From Dense Token to Sparse Memory for...	2023-07-31	Code
52	BT-Adapter	2.69	No	BT-Adapter: Video Conversation is Feasible Witho...	2023-09-27	Code
53	BT-Adapter	2.68	No	BT-Adapter: Video Conversation is Feasible Witho...	2023-09-27	Code
54	PLLaVA-34B	2.67	No	PLLaVA : Parameter-free LLaVA Extension from Ima...	2024-04-25	Code
55	MiniGPT4-video-7B	2.67	No	MiniGPT4-Video: Advancing Multimodal LLMs for Vi...	2024-04-04	Code
56	VideoChat2	2.66	No	MVBench: A Comprehensive Multi-modal Video Under...	2023-11-28	Code
57	MiniGPT4-video-7B	2.65	No	MiniGPT4-Video: Advancing Multimodal LLMs for Vi...	2024-04-04	Code
58	VideoChat2_HD_mistral	2.65	No	MVBench: A Comprehensive Multi-modal Video Under...	2023-11-28	Code
59	Video-ChatGPT	2.62	No	Video-ChatGPT: Towards Detailed Video Understand...	2023-06-08	Code
60	VideoChat2_HD_mistral	2.62	No	MVBench: A Comprehensive Multi-modal Video Under...	2023-11-28	Code
61	Video Chat	2.53	No	VideoChat: Chat-Centric Video Understanding	2023-05-10	Code
62	Video-ChatGPT	2.52	No	Video-ChatGPT: Towards Detailed Video Understand...	2023-06-08	Code
63	Video Chat	2.5	No	VideoChat: Chat-Centric Video Understanding	2023-05-10	Code
64	VTimeLLM	2.49	No	VTimeLLM: Empower LLM to Grasp Video Moments	2023-11-30	Code
65	VTimeLLM	2.47	No	VTimeLLM: Empower LLM to Grasp Video Moments	2023-11-30	Code
66	BT-Adapter (zero-shot)	2.46	No	BT-Adapter: Video Conversation is Feasible Witho...	2023-09-27	Code
67	BT-Adapter	2.46	No	BT-Adapter: Video Conversation is Feasible Witho...	2023-09-27	Code
68	MovieChat	2.42	No	MovieChat: From Dense Token to Sparse Memory for...	2023-07-31	Code
69	Video-ChatGPT	2.4	No	Video-ChatGPT: Towards Detailed Video Understand...	2023-06-08	Code
70	Chat-UniVi	2.39	No	Chat-UniVi: Unified Visual Representation Empowe...	2023-11-14	Code
71	Video-ChatGPT	2.37	No	Video-ChatGPT: Towards Detailed Video Understand...	2023-06-08	Code
72	BT-Adapter	2.34	No	BT-Adapter: Video Conversation is Feasible Witho...	2023-09-27	Code
73	Video Chat	2.32	No	VideoChat: Chat-Centric Video Understanding	2023-05-10	Code
74	LLaMA Adapter	2.32	No	LLaMA-Adapter V2: Parameter-Efficient Visual Ins...	2023-04-28	Code
75	LLaMA Adapter	2.3	No	LLaMA-Adapter V2: Parameter-Efficient Visual Ins...	2023-04-28	Code
76	MovieChat	2.24	No	MovieChat: From Dense Token to Sparse Memory for...	2023-07-31	Code
77	Video Chat	2.24	No	VideoChat: Chat-Centric Video Understanding	2023-05-10	Code
78	BT-Adapter (zero-shot)	2.2	No	BT-Adapter: Video Conversation is Feasible Witho...	2023-09-27	Code
79	Video LLaMA	2.18	No	Video-LLaMA: An Instruction-tuned Audio-Visual L...	2023-06-05	Code
80	Video LLaMA	2.16	No	Video-LLaMA: An Instruction-tuned Audio-Visual L...	2023-06-05	Code
81	BT-Adapter (zero-shot)	2.16	No	BT-Adapter: Video Conversation is Feasible Witho...	2023-09-27	Code
82	LLaMA Adapter	2.15	No	LLaMA-Adapter V2: Parameter-Efficient Visual Ins...	2023-04-28	Code
83	BT-Adapter (zero-shot)	2.13	No	BT-Adapter: Video Conversation is Feasible Witho...	2023-09-27	Code
84	LLaMA Adapter	2.03	No	LLaMA-Adapter V2: Parameter-Efficient Visual Ins...	2023-04-28	Code
85	Video-ChatGPT	1.98	No	Video-ChatGPT: Towards Detailed Video Understand...	2023-06-08	Code
86	LLaMA Adapter	1.98	No	LLaMA-Adapter V2: Parameter-Efficient Visual Ins...	2023-04-28	Code
87	Video LLaMA	1.96	No	Video-LLaMA: An Instruction-tuned Audio-Visual L...	2023-06-05	Code
88	Video Chat	1.94	No	VideoChat: Chat-Centric Video Understanding	2023-05-10	Code
89	Video LLaMA	1.82	No	Video-LLaMA: An Instruction-tuned Audio-Visual L...	2023-06-05	Code
90	Video LLaMA	1.79	No	Video-LLaMA: An Instruction-tuned Audio-Visual L...	2023-06-05	Code

#1PPLLaVA-7BSOTA
4.21
gpt-score· 2024-11-04
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance Code
#2PLLaVA-34BSOTA
3.9
gpt-score· 2024-04-25
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Code
#3TS-LLaVA-34B
3.86
gpt-score· 2024-11-17
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models Code
#4PPLLaVA-7B
3.85
gpt-score· 2024-11-04
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance Code
#5SlowFast-LLaVA-34B
3.84
gpt-score· 2024-07-22
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models Code
#6PPLLaVA-7B
3.81
gpt-score· 2024-11-04
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance Code
#7ST-LLMSOTA
3.74
gpt-score· 2024-03-30
ST-LLM: Large Language Models Are Effective Temporal Learners Code
#8VideoGPT+
3.74
gpt-score· 2024-06-13
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding Code
#9TS-LLaVA-34B
3.69
gpt-score· 2024-11-17
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models Code
#10VideoChat2_HD_mistralSOTA
3.64
gpt-score· 2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark Code
#11PLLaVA-34B
3.6
gpt-score· 2024-04-25
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Code
#12MiniGPT4-video-7B
3.57
gpt-score· 2024-04-04
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens Code
#13SlowFast-LLaVA-34B
3.57
gpt-score· 2024-07-22
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models Code
#14PPLLaVA-7B
3.56
gpt-score· 2024-11-04
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance Code
#15TS-LLaVA-34B
3.55
gpt-score· 2024-11-17
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models Code
#16VideoChat2
3.51
gpt-score· 2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark Code
#17SlowFast-LLaVA-34B
3.48
gpt-score· 2024-07-22
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models Code
#18Chat-UniViSOTA
3.46
gpt-score· 2023-11-14
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding Code
#19VTimeLLM
3.4
gpt-score· 2023-11-30
VTimeLLM: Empower LLM to Grasp Video Moments Code
#20VideoChat2_HD_mistral
3.4
gpt-score· 2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark Code
#21VideoGPT+
3.39
gpt-score· 2024-06-13
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding Code
#22BT-AdapterSOTA
3.27
gpt-score· 2023-09-27
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning Code
#23VideoGPT+
3.27
gpt-score· 2024-06-13
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding Code
#24PLLaVA-34B
3.25
gpt-score· 2024-04-25
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Code
#25ST-LLM
3.23
gpt-score· 2024-03-30
ST-LLM: Large Language Models Are Effective Temporal Learners Code
#26PPLLaVA-7B
3.21
gpt-score· 2024-11-04
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance Code
#27PLLaVA-34B
3.2
gpt-score· 2024-04-25
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Code
#28VideoGPT+
3.18
gpt-score· 2024-06-13
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding Code
#29VTimeLLM
3.1
gpt-score· 2023-11-30
VTimeLLM: Empower LLM to Grasp Video Moments Code
#30MiniGPT4-video-7B
3.08
gpt-score· 2024-04-04
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens Code
#31ST-LLM
3.05
gpt-score· 2024-03-30
ST-LLM: Large Language Models Are Effective Temporal Learners Code
#32TS-LLaVA-34B
3.03
gpt-score· 2024-11-17
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models Code
#33VideoChat2
3.02
gpt-score· 2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark Code
#34MiniGPT4-video-7B
3.02
gpt-score· 2024-04-04
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens Code
#35MovieChatSOTA
3.01
gpt-score· 2023-07-31
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding Code
#36SlowFast-LLaVA-34B
2.96
gpt-score· 2024-07-22
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models Code
#37MovieChat
2.93
gpt-score· 2023-07-31
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding Code
#38ST-LLM
2.93
gpt-score· 2024-03-30
ST-LLM: Large Language Models Are Effective Temporal Learners Code
#39Chat-UniVi
2.91
gpt-score· 2023-11-14
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding Code
#40BT-Adapter (zero-shot)
2.89
gpt-score· 2023-09-27
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning Code
#41Chat-UniVi
2.89
gpt-score· 2023-11-14
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding Code
#42VideoChat2
2.88
gpt-score· 2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark Code
#43VideoChat2_HD_mistral
2.86
gpt-score· 2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark Code
#44VideoGPT+
2.83
gpt-score· 2024-06-13
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding Code
#45Chat-UniVi
2.81
gpt-score· 2023-11-14
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding Code
#46VideoChat2
2.81
gpt-score· 2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark Code
#47ST-LLM
2.81
gpt-score· 2024-03-30
ST-LLM: Large Language Models Are Effective Temporal Learners Code
#48VTimeLLM
2.78
gpt-score· 2023-11-30
VTimeLLM: Empower LLM to Grasp Video Moments Code
#49SlowFast-LLaVA-34B
2.77
gpt-score· 2024-07-22
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models Code
#50TS-LLaVA-34B
2.77
gpt-score· 2024-11-17
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models Code
#51MovieChat
2.76
gpt-score· 2023-07-31
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding Code
#52BT-Adapter
2.69
gpt-score· 2023-09-27
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning Code
#53BT-Adapter
2.68
gpt-score· 2023-09-27
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning Code
#54PLLaVA-34B
2.67
gpt-score· 2024-04-25
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Code
#55MiniGPT4-video-7B
2.67
gpt-score· 2024-04-04
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens Code
#56VideoChat2
2.66
gpt-score· 2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark Code
#57MiniGPT4-video-7B
2.65
gpt-score· 2024-04-04
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens Code
#58VideoChat2_HD_mistral
2.65
gpt-score· 2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark Code
#59Video-ChatGPTSOTA
2.62
gpt-score· 2023-06-08
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models Code
#60VideoChat2_HD_mistral
2.62
gpt-score· 2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark Code
#61Video ChatSOTA
2.53
gpt-score· 2023-05-10
VideoChat: Chat-Centric Video Understanding Code
#62Video-ChatGPT
2.52
gpt-score· 2023-06-08
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models Code
#63Video Chat
2.5
gpt-score· 2023-05-10
VideoChat: Chat-Centric Video Understanding Code
#64VTimeLLM
2.49
gpt-score· 2023-11-30
VTimeLLM: Empower LLM to Grasp Video Moments Code
#65VTimeLLM
2.47
gpt-score· 2023-11-30
VTimeLLM: Empower LLM to Grasp Video Moments Code
#66BT-Adapter (zero-shot)
2.46
gpt-score· 2023-09-27
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning Code
#67BT-Adapter
2.46
gpt-score· 2023-09-27
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning Code
#68MovieChat
2.42
gpt-score· 2023-07-31
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding Code
#69Video-ChatGPT
2.4
gpt-score· 2023-06-08
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models Code
#70Chat-UniVi
2.39
gpt-score· 2023-11-14
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding Code
#71Video-ChatGPT
2.37
gpt-score· 2023-06-08
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models Code
#72BT-Adapter
2.34
gpt-score· 2023-09-27
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning Code
#73Video Chat
2.32
gpt-score· 2023-05-10
VideoChat: Chat-Centric Video Understanding Code
#74LLaMA AdapterSOTA
2.32
gpt-score· 2023-04-28
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model Code
#75LLaMA Adapter
2.3
gpt-score· 2023-04-28
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model Code
#76MovieChat
2.24
gpt-score· 2023-07-31
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding Code
#77Video Chat
2.24
gpt-score· 2023-05-10
VideoChat: Chat-Centric Video Understanding Code
#78BT-Adapter (zero-shot)
2.2
gpt-score· 2023-09-27
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning Code
#79Video LLaMA
2.18
gpt-score· 2023-06-05
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding Code
#80Video LLaMA
2.16
gpt-score· 2023-06-05
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding Code
#81BT-Adapter (zero-shot)
2.16
gpt-score· 2023-09-27
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning Code
#82LLaMA Adapter
2.15
gpt-score· 2023-04-28
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model Code
#83BT-Adapter (zero-shot)
2.13
gpt-score· 2023-09-27
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning Code
#84LLaMA Adapter
2.03
gpt-score· 2023-04-28
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model Code
#85Video-ChatGPT
1.98
gpt-score· 2023-06-08
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models Code
#86LLaMA Adapter
1.98
gpt-score· 2023-04-28
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model Code
#87Video LLaMA
1.96
gpt-score· 2023-06-05
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding Code
#88Video Chat
1.94
gpt-score· 2023-05-10
VideoChat: Chat-Centric Video Understanding Code
#89Video LLaMA
1.82
gpt-score· 2023-06-05
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding Code
#90Video LLaMA
1.79
gpt-score· 2023-06-05
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding Code