Video on Kinetics-700

Metric: Top-5 Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	Top-5 Accuracy▼	Extra Data	Paper	Date↕	Code
1	UMT-L (ViT-L/16)	96.7	Yes	Unmasked Teacher: Towards Training-Efficient Vid...	2023-03-28	Code
2	TubeViT-L	96.6	No	Rethinking Video ViTs: Sparse Video Tubes for Jo...	2022-12-06	Code
3	MTV-H (WTS 60M)	96.2	Yes	Multiview Transformers for Video Recognition	2022-01-12	Code
4	UniFormerV2-L	96.2	Yes	-	-	Code
5	MaskFeat (no extra data, MViT-L)	95.7	No	Masked Feature Prediction for Self-Supervised Vi...	2021-12-16	Code
6	mPLUG-2	94.9	Yes	mPLUG-2: A Modularized Multi-modal Foundation Mo...	2023-02-01	Code
7	CoVeR (JFT-3B)	94.9	Yes	Co-training Transformer with Videos and Images I...	2021-12-14	-
8	MViTv2-L (ImageNet-21k pretrain)	94.9	Yes	MViTv2: Improved Multiscale Vision Transformers ...	2021-12-02	Code
9	CoVeR (JFT-300M)	94.2	Yes	Co-training Transformer with Videos and Images I...	2021-12-14	-
10	MViTv2-B	93.2	No	MViTv2: Improved Multiscale Vision Transformers ...	2021-12-02	Code
11	En-VidTr-L	89.4	No	VidTr: Video Transformer Without Convolutions	2021-04-23	-
12	VidTr-L	89	No	VidTr: Video Transformer Without Convolutions	2021-04-23	-
13	VidTr-M	88.3	No	VidTr: Video Transformer Without Convolutions	2021-04-23	-
14	VidTr-S	87.7	No	VidTr: Video Transformer Without Convolutions	2021-04-23	-
15	SRTG r3d-101	76.82	No	Learn to cycle: Time-consistent feature discover...	2020-06-15	Code
16	SRTG r(2+1)d-50	74.62	No	Learn to cycle: Time-consistent feature discover...	2020-06-15	Code
17	SRTG r3d-50	74.17	No	Learn to cycle: Time-consistent feature discover...	2020-06-15	Code
18	SRTG r(2+1)d-34	73.23	No	Learn to cycle: Time-consistent feature discover...	2020-06-15	Code
19	SRTG r3d-34	72.68	No	Learn to cycle: Time-consistent feature discover...	2020-06-15	Code

#1UMT-L (ViT-L/16)SOTA
96.7
Top-5 Accuracy· Extra Data· 2023-03-28
Unmasked Teacher: Towards Training-Efficient Video Foundation Models Code
#2TubeViT-LSOTA
96.6
Top-5 Accuracy· 2022-12-06
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning Code
#3MTV-H (WTS 60M)SOTA
96.2
Top-5 Accuracy· Extra Data· 2022-01-12
Multiview Transformers for Video Recognition Code
#4UniFormerV2-L
96.2
Top-5 Accuracy· Extra Data
No paperCode
#5MaskFeat (no extra data, MViT-L)SOTA
95.7
Top-5 Accuracy· 2021-12-16
Masked Feature Prediction for Self-Supervised Visual Pre-Training Code
#6mPLUG-2
94.9
Top-5 Accuracy· Extra Data· 2023-02-01
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video Code
#7CoVeR (JFT-3B)
94.9
Top-5 Accuracy· Extra Data· 2021-12-14
Co-training Transformer with Videos and Images Improves Action Recognition
#8MViTv2-L (ImageNet-21k pretrain)SOTA
94.9
Top-5 Accuracy· Extra Data· 2021-12-02
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Code
#9CoVeR (JFT-300M)
94.2
Top-5 Accuracy· Extra Data· 2021-12-14
Co-training Transformer with Videos and Images Improves Action Recognition
#10MViTv2-B
93.2
Top-5 Accuracy· 2021-12-02
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Code
#11En-VidTr-LSOTA
89.4
Top-5 Accuracy· 2021-04-23
VidTr: Video Transformer Without Convolutions
#12VidTr-L
89
Top-5 Accuracy· 2021-04-23
VidTr: Video Transformer Without Convolutions
#13VidTr-M
88.3
Top-5 Accuracy· 2021-04-23
VidTr: Video Transformer Without Convolutions
#14VidTr-S
87.7
Top-5 Accuracy· 2021-04-23
VidTr: Video Transformer Without Convolutions
#15SRTG r3d-101SOTA
76.82
Top-5 Accuracy· 2020-06-15
Learn to cycle: Time-consistent feature discovery for action recognition Code
#16SRTG r(2+1)d-50
74.62
Top-5 Accuracy· 2020-06-15
Learn to cycle: Time-consistent feature discovery for action recognition Code
#17SRTG r3d-50
74.17
Top-5 Accuracy· 2020-06-15
Learn to cycle: Time-consistent feature discovery for action recognition Code
#18SRTG r(2+1)d-34
73.23
Top-5 Accuracy· 2020-06-15
Learn to cycle: Time-consistent feature discovery for action recognition Code
#19SRTG r3d-34
72.68
Top-5 Accuracy· 2020-06-15
Learn to cycle: Time-consistent feature discovery for action recognition Code