Video Question Answering on How2QA

Metric: Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Hide extra data

Sort:

#	Model↕	Accuracy▼	Extra Data	Paper	Date↕	Code
1	Text + Text (no Multimodal Pretext Training)	93.2	No	Towards Fast Adaptation of Pretrained Contrastiv...	2022-06-05	Code
2	FrozenBiLM	86.7	Yes	Zero-Shot Video Question Answering via Frozen Bi...	2022-06-16	Code
3	Just Ask	84.4	Yes	Just Ask: Learning to Answer Questions from Mill...	2020-12-01	Code
4	SeViLA	83.7	No	-	-	-
5	Hero w/ pre-training	77.75	No	HERO: Hierarchical Encoder for Video+Language Om...	2020-05-01	Code
6	ATP	65.1	No	Revisiting the "Video" in Video-Language Underst...	2022-06-03	Code
7	FrozenBiLM (0-shot)	58.4	No	Zero-Shot Video Question Answering via Frozen Bi...	2022-06-16	Code
8	Just Ask (0-shot)	51.1	No	Just Ask: Learning to Answer Questions from Mill...	2020-12-01	Code

#1Text + Text (no Multimodal Pretext Training)SOTA
93.2
Accuracy· 2022-06-05
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval Code
#2FrozenBiLM
86.7
Accuracy· Extra Data· 2022-06-16
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models Code
#3Just AskSOTA
84.4
Accuracy· Extra Data· 2020-12-01
Just Ask: Learning to Answer Questions from Millions of Narrated Videos Code
#4SeViLA
83.7
Accuracy
No paper
#5Hero w/ pre-trainingSOTA
77.75
Accuracy· 2020-05-01
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training Code
#6ATP
65.1
Accuracy· 2022-06-03
Revisiting the "Video" in Video-Language Understanding Code
#7FrozenBiLM (0-shot)
58.4
Accuracy· 2022-06-16
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models Code
#8Just Ask (0-shot)
51.1
Accuracy· 2020-12-01
Just Ask: Learning to Answer Questions from Millions of Narrated Videos Code