Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Video
/
MiT
Video on MiT
Metric: Top 5 Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Top 5 Accuracy (best first)
Top 5 Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Top 5 Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
UMT-L (ViT-L/16)
78.2
Yes
Unmasked Teacher: Towards Training-Efficient Vid...
2023-03-28
Code
2
UniFormerV2-L
76.9
Yes
-
-
Code
3
MTV-H (WTS 60M)
75.7
Yes
Multiview Transformers for Video Recognition
2022-01-12
Code
4
CoVeR(JFT-3B)
75.4
Yes
Co-training Transformer with Videos and Images I...
2021-12-14
-
5
CoVeR(JFT-300M)
73.9
Yes
Co-training Transformer with Videos and Images I...
2021-12-14
-
6
VATT-Large
67.7
Yes
VATT: Transformers for Multimodal Self-Supervise...
2021-04-22
Code
7
VTN
65.4
Yes
Video Transformer Network
2021-02-01
Code
8
ViViT-L/16x2
64.9
Yes
ViViT: A Video Vision Transformer
2021-03-29
Code
9
MBT (AV)
61.2
No
Attention Bottlenecks for Multimodal Fusion
2021-06-30
Code
10
SRTG r3d-101
58.49
No
Learn to cycle: Time-consistent feature discover...
2020-06-15
Code
11
SRTG r(2+1)d-50
56.8
No
Learn to cycle: Time-consistent feature discover...
2020-06-15
Code
12
SRTG r3d-50
55.65
No
Learn to cycle: Time-consistent feature discover...
2020-06-15
Code
13
SRTG r(2+1)d-34
54.18
No
Learn to cycle: Time-consistent feature discover...
2020-06-15
Code
14
TRN-Multiscale
53.87
No
Moments in Time Dataset: one million videos for ...
2018-01-09
Code
15
SRTG r3d-34
52.35
No
Learn to cycle: Time-consistent feature discover...
2020-06-15
Code
#1
UMT-L (ViT-L/16)
SOTA
78.2
Top 5 Accuracy
· Extra Data
· 2023-03-28
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Code
#2
UniFormerV2-L
76.9
Top 5 Accuracy
· Extra Data
No paper
Code
#3
MTV-H (WTS 60M)
SOTA
75.7
Top 5 Accuracy
· Extra Data
· 2022-01-12
Multiview Transformers for Video Recognition
Code
#4
CoVeR(JFT-3B)
SOTA
75.4
Top 5 Accuracy
· Extra Data
· 2021-12-14
Co-training Transformer with Videos and Images Improves Action Recognition
#5
CoVeR(JFT-300M)
73.9
Top 5 Accuracy
· Extra Data
· 2021-12-14
Co-training Transformer with Videos and Images Improves Action Recognition
#6
VATT-Large
SOTA
67.7
Top 5 Accuracy
· Extra Data
· 2021-04-22
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Code
#7
VTN
SOTA
65.4
Top 5 Accuracy
· Extra Data
· 2021-02-01
Video Transformer Network
Code
#8
ViViT-L/16x2
64.9
Top 5 Accuracy
· Extra Data
· 2021-03-29
ViViT: A Video Vision Transformer
Code
#9
MBT (AV)
61.2
Top 5 Accuracy
· 2021-06-30
Attention Bottlenecks for Multimodal Fusion
Code
#10
SRTG r3d-101
SOTA
58.49
Top 5 Accuracy
· 2020-06-15
Learn to cycle: Time-consistent feature discovery for action recognition
Code
#11
SRTG r(2+1)d-50
56.8
Top 5 Accuracy
· 2020-06-15
Learn to cycle: Time-consistent feature discovery for action recognition
Code
#12
SRTG r3d-50
55.65
Top 5 Accuracy
· 2020-06-15
Learn to cycle: Time-consistent feature discovery for action recognition
Code
#13
SRTG r(2+1)d-34
54.18
Top 5 Accuracy
· 2020-06-15
Learn to cycle: Time-consistent feature discovery for action recognition
Code
#14
TRN-Multiscale
SOTA
53.87
Top 5 Accuracy
· 2018-01-09
Moments in Time Dataset: one million videos for event understanding
Code
#15
SRTG r3d-34
52.35
Top 5 Accuracy
· 2020-06-15
Learn to cycle: Time-consistent feature discovery for action recognition
Code