Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Robots
/
Activity Recognition
/
HMDB51
Activity Recognition on HMDB51
Metric: Top-1 Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Top-1 Accuracy (best first)
Top-1 Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Top-1 Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
MVD (ViT-B)
79.7
No
Masked Video Distillation: Rethinking Masked Fea...
2022-12-08
Code
2
M3Video
78
No
Masked Motion Encoding for Self-Supervised Video...
2022-10-12
Code
3
pBYOL
75
No
A Large-Scale Study on Unsupervised Spatiotempor...
2021-04-29
Code
4
SCE (R3D-50)
74.7
No
Similarity Contrastive Estimation for Image and ...
2022-12-21
Code
5
VideoMAE
73.3
No
VideoMAE: Masked Autoencoders are Data-Efficient...
2022-03-23
Code
6
BraVe:V-FA (TSM-50x2)
70.5
No
Broaden Your Views for Self-Supervised Video Lea...
2021-03-30
Code
7
CVRL (R3D-152 2x; K600)
69.9
No
Spatiotemporal Contrastive Video Representation ...
2020-08-09
Code
8
XKD (ViT-B/112/16)
69
No
XKD: Cross-modal Knowledge Distillation with Dom...
2022-11-25
Code
9
XDC
68.9
No
Self-Supervised Learning by Cross-Modal Audio-Vi...
2019-11-28
Code
10
CVRL (R3D-50; K600)
68
No
Spatiotemporal Contrastive Video Representation ...
2020-08-09
Code
11
CrissCross (AudioSet)
66.8
No
Self-Supervised Audio-Visual Representation Lear...
2021-11-09
Code
12
CVRL (R3D-50; K400)
66.7
No
Spatiotemporal Contrastive Video Representation ...
2020-08-09
Code
13
XDC
66.5
No
Self-Supervised Learning by Cross-Modal Audio-Vi...
2019-11-28
Code
14
XKD-Modality-Agnostic (ViT-B/112/16)
65.9
No
XKD: Cross-modal Knowledge Distillation with Dom...
2022-11-25
Code
15
VideoMS (ViT-B)
65.8
No
EVEREST: Efficient Masked Video Autoencoder by R...
2022-11-19
Code
16
AVID+CMA (Modified R2+1D-18 on Audioset)
64.7
No
Audio-Visual Instance Discrimination with Cross-...
2020-04-27
Code
17
RSPNet
64.7
No
RSPNet: Relative Speed Perception for Unsupervis...
2020-10-27
Code
18
CrissCross (Kinetics400)
64.7
No
Self-Supervised Audio-Visual Representation Lear...
2021-11-09
Code
19
ELo
64.5
No
Evolving Losses for Unsupervised Video Represent...
2020-02-26
-
20
AVID (Modified R2+1D-18 on Audioset)
64.1
No
Audio-Visual Instance Discrimination with Cross-...
2020-04-27
Code
21
XDC
63.7
No
Self-Supervised Learning by Cross-Modal Audio-Vi...
2019-11-28
Code
22
VideoMAE(no extra data)
62.6
No
VideoMAE: Masked Autoencoders are Data-Efficient...
2022-03-23
Code
23
ViCC (S3D; R+F)
62.2
No
Self-supervised Video Representation Learning wi...
2021-06-18
Code
24
ViCC (R2+1D; R+F)
61.5
No
Self-supervised Video Representation Learning wi...
2021-06-18
Code
25
AVID+CMA (Modified R2+1D-18 on Kinetics)
60.8
No
Audio-Visual Instance Discrimination with Cross-...
2020-04-27
Code
26
CrissCross (Kinetics-Sound)
60.5
No
Self-Supervised Audio-Visual Representation Lear...
2021-11-09
Code
27
AVID (Modified R2+1D-18 on Kinetics)
59.9
No
Audio-Visual Instance Discrimination with Cross-...
2020-04-27
Code
28
MCN (R3D-18; RGB)
54.8
No
Self-Supervised Video Representation Learning wi...
2021-08-19
-
29
MCN (R2+1D; RGB)
54.5
No
Self-Supervised Video Representation Learning wi...
2021-08-19
-
30
SLIC (R3D-18)
54.5
No
SLIC: Self-Supervised Learning with Iterative Cl...
2022-06-25
Code
31
TCLR (R3D-18)
52.9
No
TCLR: Temporal Contrastive Learning for Video Re...
2021-01-20
Code
32
XDC
52.6
No
Self-Supervised Learning by Cross-Modal Audio-Vi...
2019-11-28
Code
33
ViCC (R2+1D; RGB)
52.4
No
Self-supervised Video Representation Learning wi...
2021-06-18
Code
34
CoCLR
46.1
No
Self-supervised Co-training for Video Representa...
2020-10-19
Code
35
PCL (ResNet-18)
43.2
No
Pretext-Contrastive Learning: Toward Good Practi...
2020-10-29
Code
36
ViCC (S3D; RGB)
38.5
No
Self-supervised Video Representation Learning wi...
2021-06-18
Code
37
IIC (R3D)
38.3
No
Self-supervised Video Representation Learning Us...
2020-08-06
Code
38
TCE (ResNet-50)
36.6
No
Temporally Coherent Embeddings for Self-Supervis...
2020-03-21
Code
39
DPC (Modified 3D Resnet-34)
35.7
No
Video Representation Learning by Dense Predictiv...
2019-09-10
Code
40
DPC (Modified 3D ResNet-18)
34.5
No
Video Representation Learning by Dense Predictiv...
2019-09-10
Code
41
TCE (ResNet-18)
34.2
No
Temporally Coherent Embeddings for Self-Supervis...
2020-03-21
Code
42
3D RotNet (3D ResNet-18)
33.7
No
Self-Supervised Spatiotemporal Feature Learning ...
2018-11-28
-
43
3D Cubic Puzzles (3D ResNet-18)
33.7
No
Self-Supervised Video Representation Learning wi...
2018-11-24
-
44
VCP (R3D)
31.5
No
Video Cloze Procedure for Self-Supervised Spatio...
2020-01-02
Code
45
Video Clip Ordering (R3D)
29.5
No
-
-
-
46
OPN (VGG-M-2048)
23.8
No
Unsupervised Representation Learning by Sorting ...
2017-08-03
Code
47
Motion & Appearance (C3D)
20.3
No
Self-supervised Spatio-temporal Representation L...
2019-04-07
Code
48
Shuffle and Learn (AlexNet)
19.8
No
Shuffle and Learn: Unsupervised Learning using T...
2016-03-28
-
#1
MVD (ViT-B)
SOTA
79.7
Top-1 Accuracy
· 2022-12-08
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Code
#2
M3Video
SOTA
78
Top-1 Accuracy
· 2022-10-12
Masked Motion Encoding for Self-Supervised Video Representation Learning
Code
#3
pBYOL
SOTA
75
Top-1 Accuracy
· 2021-04-29
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
Code
#4
SCE (R3D-50)
74.7
Top-1 Accuracy
· 2022-12-21
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
Code
#5
VideoMAE
73.3
Top-1 Accuracy
· 2022-03-23
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Code
#6
BraVe:V-FA (TSM-50x2)
SOTA
70.5
Top-1 Accuracy
· 2021-03-30
Broaden Your Views for Self-Supervised Video Learning
Code
#7
CVRL (R3D-152 2x; K600)
SOTA
69.9
Top-1 Accuracy
· 2020-08-09
Spatiotemporal Contrastive Video Representation Learning
Code
#8
XKD (ViT-B/112/16)
69
Top-1 Accuracy
· 2022-11-25
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Code
#9
XDC
SOTA
68.9
Top-1 Accuracy
· 2019-11-28
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Code
#10
CVRL (R3D-50; K600)
68
Top-1 Accuracy
· 2020-08-09
Spatiotemporal Contrastive Video Representation Learning
Code
#11
CrissCross (AudioSet)
66.8
Top-1 Accuracy
· 2021-11-09
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
Code
#12
CVRL (R3D-50; K400)
66.7
Top-1 Accuracy
· 2020-08-09
Spatiotemporal Contrastive Video Representation Learning
Code
#13
XDC
66.5
Top-1 Accuracy
· 2019-11-28
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Code
#14
XKD-Modality-Agnostic (ViT-B/112/16)
65.9
Top-1 Accuracy
· 2022-11-25
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Code
#15
VideoMS (ViT-B)
65.8
Top-1 Accuracy
· 2022-11-19
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens
Code
#16
AVID+CMA (Modified R2+1D-18 on Audioset)
64.7
Top-1 Accuracy
· 2020-04-27
Audio-Visual Instance Discrimination with Cross-Modal Agreement
Code
#17
RSPNet
64.7
Top-1 Accuracy
· 2020-10-27
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning
Code
#18
CrissCross (Kinetics400)
64.7
Top-1 Accuracy
· 2021-11-09
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
Code
#19
ELo
64.5
Top-1 Accuracy
· 2020-02-26
Evolving Losses for Unsupervised Video Representation Learning
#20
AVID (Modified R2+1D-18 on Audioset)
64.1
Top-1 Accuracy
· 2020-04-27
Audio-Visual Instance Discrimination with Cross-Modal Agreement
Code
#21
XDC
63.7
Top-1 Accuracy
· 2019-11-28
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Code
#22
VideoMAE(no extra data)
62.6
Top-1 Accuracy
· 2022-03-23
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Code
#23
ViCC (S3D; R+F)
62.2
Top-1 Accuracy
· 2021-06-18
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
Code
#24
ViCC (R2+1D; R+F)
61.5
Top-1 Accuracy
· 2021-06-18
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
Code
#25
AVID+CMA (Modified R2+1D-18 on Kinetics)
60.8
Top-1 Accuracy
· 2020-04-27
Audio-Visual Instance Discrimination with Cross-Modal Agreement
Code
#26
CrissCross (Kinetics-Sound)
60.5
Top-1 Accuracy
· 2021-11-09
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
Code
#27
AVID (Modified R2+1D-18 on Kinetics)
59.9
Top-1 Accuracy
· 2020-04-27
Audio-Visual Instance Discrimination with Cross-Modal Agreement
Code
#28
MCN (R3D-18; RGB)
54.8
Top-1 Accuracy
· 2021-08-19
Self-Supervised Video Representation Learning with Meta-Contrastive Network
#29
MCN (R2+1D; RGB)
54.5
Top-1 Accuracy
· 2021-08-19
Self-Supervised Video Representation Learning with Meta-Contrastive Network
#30
SLIC (R3D-18)
54.5
Top-1 Accuracy
· 2022-06-25
SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos
Code
#31
TCLR (R3D-18)
52.9
Top-1 Accuracy
· 2021-01-20
TCLR: Temporal Contrastive Learning for Video Representation
Code
#32
XDC
52.6
Top-1 Accuracy
· 2019-11-28
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Code
#33
ViCC (R2+1D; RGB)
52.4
Top-1 Accuracy
· 2021-06-18
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
Code
#34
CoCLR
46.1
Top-1 Accuracy
· 2020-10-19
Self-supervised Co-training for Video Representation Learning
Code
#35
PCL (ResNet-18)
43.2
Top-1 Accuracy
· 2020-10-29
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning
Code
#36
ViCC (S3D; RGB)
38.5
Top-1 Accuracy
· 2021-06-18
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
Code
#37
IIC (R3D)
38.3
Top-1 Accuracy
· 2020-08-06
Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework
Code
#38
TCE (ResNet-50)
36.6
Top-1 Accuracy
· 2020-03-21
Temporally Coherent Embeddings for Self-Supervised Video Representation Learning
Code
#39
DPC (Modified 3D Resnet-34)
SOTA
35.7
Top-1 Accuracy
· 2019-09-10
Video Representation Learning by Dense Predictive Coding
Code
#40
DPC (Modified 3D ResNet-18)
34.5
Top-1 Accuracy
· 2019-09-10
Video Representation Learning by Dense Predictive Coding
Code
#41
TCE (ResNet-18)
34.2
Top-1 Accuracy
· 2020-03-21
Temporally Coherent Embeddings for Self-Supervised Video Representation Learning
Code
#42
3D RotNet (3D ResNet-18)
33.7
Top-1 Accuracy
· 2018-11-28
Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction
#43
3D Cubic Puzzles (3D ResNet-18)
SOTA
33.7
Top-1 Accuracy
· 2018-11-24
Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles
#44
VCP (R3D)
31.5
Top-1 Accuracy
· 2020-01-02
Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning
Code
#45
Video Clip Ordering (R3D)
29.5
Top-1 Accuracy
No paper
#46
OPN (VGG-M-2048)
SOTA
23.8
Top-1 Accuracy
· 2017-08-03
Unsupervised Representation Learning by Sorting Sequences
Code
#47
Motion & Appearance (C3D)
20.3
Top-1 Accuracy
· 2019-04-07
Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics
Code
#48
Shuffle and Learn (AlexNet)
SOTA
19.8
Top-1 Accuracy
· 2016-03-28
Shuffle and Learn: Unsupervised Learning using Temporal Order Verification