Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Video
/
MiT
Video on MiT
Metric: Top 5 Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
Top 5 Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
UMT-L (ViT-L/16)
78.2
Yes
Unmasked Teacher: Towards Training-Efficient Vid...
2023-03-28
Code
2
UniFormerV2-L
76.9
Yes
-
-
Code
3
MTV-H (WTS 60M)
75.7
Yes
Multiview Transformers for Video Recognition
2022-01-12
Code
4
CoVeR(JFT-3B)
75.4
Yes
Co-training Transformer with Videos and Images I...
2021-12-14
-
5
CoVeR(JFT-300M)
73.9
Yes
Co-training Transformer with Videos and Images I...
2021-12-14
-
6
VATT-Large
67.7
Yes
VATT: Transformers for Multimodal Self-Supervise...
2021-04-22
Code
7
VTN
65.4
Yes
Video Transformer Network
2021-02-01
Code
8
ViViT-L/16x2
64.9
Yes
ViViT: A Video Vision Transformer
2021-03-29
Code
9
MBT (AV)
61.2
No
Attention Bottlenecks for Multimodal Fusion
2021-06-30
Code
10
SRTG r3d-101
58.49
No
Learn to cycle: Time-consistent feature discover...
2020-06-15
Code
11
SRTG r(2+1)d-50
56.8
No
Learn to cycle: Time-consistent feature discover...
2020-06-15
Code
12
SRTG r3d-50
55.65
No
Learn to cycle: Time-consistent feature discover...
2020-06-15
Code
13
SRTG r(2+1)d-34
54.18
No
Learn to cycle: Time-consistent feature discover...
2020-06-15
Code
14
TRN-Multiscale
53.87
No
Moments in Time Dataset: one million videos for ...
2018-01-09
Code
15
SRTG r3d-34
52.35
No
Learn to cycle: Time-consistent feature discover...
2020-06-15
Code