Video Question Answering on iVQA

Metric: Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Hide extra data

Sort:

#	Model↕	Accuracy▼	Extra Data	Paper	Date↕	Code
1	Text + Text (no Multimodal Pretext Training)	40.2	No	Towards Fast Adaptation of Pretrained Contrastiv...	2022-06-05	Code
2	FrozenBiLM	39.6	Yes	Zero-Shot Video Question Answering via Frozen Bi...	2022-06-16	Code
3	VideoCoCa	39	Yes	VideoCoCa: Video-Text Modeling with Zero-Shot Tr...	2022-12-09	-
4	Co-Tokenization	38.2	No	Video Question Answering with Iterative Video-Te...	2022-08-01	-
5	Just Ask (fine-tune)	35.4	No	Just Ask: Learning to Answer Questions from Mill...	2020-12-01	Code
6	FrozenBiLM (0-shot)	26.8	No	Zero-Shot Video Question Answering via Frozen Bi...	2022-06-16	Code
7	Just Ask (0-shot)	12.2	No	Just Ask: Learning to Answer Questions from Mill...	2020-12-01	Code

#1Text + Text (no Multimodal Pretext Training)SOTA
40.2
Accuracy· 2022-06-05
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval Code
#2FrozenBiLM
39.6
Accuracy· Extra Data· 2022-06-16
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models Code
#3VideoCoCa
39
Accuracy· Extra Data· 2022-12-09
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
#4Co-Tokenization
38.2
Accuracy· 2022-08-01
Video Question Answering with Iterative Video-Text Co-Tokenization
#5Just Ask (fine-tune)SOTA
35.4
Accuracy· 2020-12-01
Just Ask: Learning to Answer Questions from Millions of Narrated Videos Code
#6FrozenBiLM (0-shot)
26.8
Accuracy· 2022-06-16
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models Code
#7Just Ask (0-shot)
12.2
Accuracy· 2020-12-01
Just Ask: Learning to Answer Questions from Millions of Narrated Videos Code