Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Time Series
/
Action Recognition
/
Something-Something V1
Action Recognition on Something-Something V1
Metric: Top 5 Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Top 5 Accuracy (best first)
Top 5 Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Top 5 Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
VideoMAE V2-g
91.9
Yes
VideoMAE V2: Scaling Video Masked Autoencoders w...
2023-03-29
Code
2
Side4Video (EVA ViT-E/14
88.8
No
Side4Video: Spatial-Temporal Side Network for Me...
2023-11-27
Code
3
ATM
88.6
No
What Can Simple Arithmetic Operations Do for Tem...
2023-07-18
Code
4
UniFormerV2-L
88
Yes
-
-
Code
5
TDS-CLIP-ViT-L/14(8frames)
87.8
No
TDS-CLIP: Temporal Difference Side Network for I...
2024-08-20
Code
6
UniFormer-B (IN-1K + Kinetics400)
87.3
No
-
-
Code
7
TRG (ResNet-50)
86.1
No
Temporal Reasoning Graph for Activity Recognition
2019-08-27
-
8
UniFormer-B (IN-1K + Kinetics600)
84.9
No
-
-
Code
9
SELFYNet-TSM-R50En (8+16 frames, ImageNet pretrained, 2 clips)
84.4
Yes
Learning Self-Similarity in Space and Time as Ge...
2021-02-14
Code
10
BQNEn (ImageNet + K400 pretrained)
84.2
No
Busy-Quiet Video Disentangling for Video Classif...
2021-03-29
Code
11
TDN ResNet101 (one clip, center crop, 8+16 ensemble, ImageNet pretrained, RGB only)
84.1
No
TDN: Temporal Difference Networks for Efficient ...
2020-12-18
Code
12
EAN ResNet50 (single clip, center crop,8+16 ensemble, with sparse Transformer)
83.9
No
EAN: Event Adaptive Network for Enhanced Action ...
2021-07-22
Code
13
SELFYNet-TSM-R50En (8+16 frames, ImageNet pretrained, a single clip)
83.9
Yes
Learning Self-Similarity in Space and Time as Ge...
2021-02-14
Code
14
MSNet-R50En (8+16 ensemble, ImageNet pretrained)
83.8
Yes
MotionSqueeze: Neural Motion Feature Learning fo...
2020-07-20
Code
15
SELFYNet-TSM-R50 (16 frames, ImageNet pretrained)
82.9
Yes
Learning Self-Similarity in Space and Time as Ge...
2021-02-14
Code
16
RSANet-R50 (8+16 frames, ImageNet pretrained, 2 clips)
82.8
No
Relational Self-Attention: What's Missing in Att...
2021-11-02
Code
17
PAN ResNet101 (RGB only, no Flow)
82.8
No
PAN: Towards Fast Action Recognition via Learnin...
2020-08-08
Code
18
RSANet-R50 (8+16 frames, ImageNet pretrained, a single clip)
82.6
No
Relational Self-Attention: What's Missing in Att...
2021-11-02
Code
19
VoV3D-L (32frames, Kinetics pretrained, single)
82.3
Yes
Diverse Temporal Aggregation and Depthwise Spati...
2020-12-01
Code
20
MSNet-R50 (16 frames, ImageNet pretrained)
82.3
Yes
MotionSqueeze: Neural Motion Feature Learning fo...
2020-07-20
Code
21
RNL+TSM Ensemble(R50+R101, ImageNet pretrained)
82.2
No
Region-based Non-local Operation for Video Class...
2020-07-17
Code
22
RNL+TSM Ensemble(ResNet50, ImageNet pretrained)
81.5
No
Region-based Non-local Operation for Video Class...
2020-07-17
Code
23
TSM+W3 (16 frames, ResNet50)
81.3
No
Knowing What, Where and When to Look: Efficient ...
2020-04-02
-
24
RSANet-R50 (16 frames, ImageNet pretrained, a single clip)
81.1
No
Relational Self-Attention: What's Missing in Att...
2021-11-02
Code
25
VoV3D-M (32frames, Kinetics pretrained, single)
80.43
Yes
Diverse Temporal Aggregation and Depthwise Spati...
2020-12-01
Code
26
MSNet-R50 (8 frames, ImageNet pretrained)
80.3
No
MotionSqueeze: Neural Motion Feature Learning fo...
2020-07-20
Code
27
RSANet-R50 (8 frames, ImageNet pretrained, a single clip)
79.6
No
Relational Self-Attention: What's Missing in Att...
2021-11-02
Code
28
VoV3D-L (32frames, from scratch, single)
78.7
No
Diverse Temporal Aggregation and Depthwise Spati...
2020-12-01
Code
29
S3D-G (ImageNet pretrained)
78.7
Yes
Rethinking Spatiotemporal Feature Learning: Spee...
2017-12-13
Code
30
TSMEn
78.5
No
TSM: Temporal Shift Module for Efficient Video U...
2018-11-20
Code
31
S3D
78.1
No
Rethinking Spatiotemporal Feature Learning: Spee...
2017-12-13
Code
32
VoV3D-M (32frames, from scratch, single)
78
No
Diverse Temporal Aggregation and Depthwise Spati...
2020-12-01
Code
33
VoV3D-L (16frames, from scratch, single)
78
No
Diverse Temporal Aggregation and Depthwise Spati...
2020-12-01
Code
34
TSM
77.1
No
TSM: Temporal Shift Module for Efficient Video U...
2018-11-20
Code
35
VoV3D-M (16frames, from scratch, single)
76.9
No
Diverse Temporal Aggregation and Depthwise Spati...
2020-12-01
Code
#1
VideoMAE V2-g
SOTA
91.9
Top 5 Accuracy
· Extra Data
· 2023-03-29
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Code
#2
Side4Video (EVA ViT-E/14
88.8
Top 5 Accuracy
· 2023-11-27
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Code
#3
ATM
88.6
Top 5 Accuracy
· 2023-07-18
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Code
#4
UniFormerV2-L
88
Top 5 Accuracy
· Extra Data
No paper
Code
#5
TDS-CLIP-ViT-L/14(8frames)
87.8
Top 5 Accuracy
· 2024-08-20
TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning
Code
#6
UniFormer-B (IN-1K + Kinetics400)
87.3
Top 5 Accuracy
No paper
Code
#7
TRG (ResNet-50)
SOTA
86.1
Top 5 Accuracy
· 2019-08-27
Temporal Reasoning Graph for Activity Recognition
#8
UniFormer-B (IN-1K + Kinetics600)
84.9
Top 5 Accuracy
No paper
Code
#9
SELFYNet-TSM-R50En (8+16 frames, ImageNet pretrained, 2 clips)
84.4
Top 5 Accuracy
· Extra Data
· 2021-02-14
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition
Code
#10
BQNEn (ImageNet + K400 pretrained)
84.2
Top 5 Accuracy
· 2021-03-29
Busy-Quiet Video Disentangling for Video Classification
Code
#11
TDN ResNet101 (one clip, center crop, 8+16 ensemble, ImageNet pretrained, RGB only)
84.1
Top 5 Accuracy
· 2020-12-18
TDN: Temporal Difference Networks for Efficient Action Recognition
Code
#12
EAN ResNet50 (single clip, center crop,8+16 ensemble, with sparse Transformer)
83.9
Top 5 Accuracy
· 2021-07-22
EAN: Event Adaptive Network for Enhanced Action Recognition
Code
#13
SELFYNet-TSM-R50En (8+16 frames, ImageNet pretrained, a single clip)
83.9
Top 5 Accuracy
· Extra Data
· 2021-02-14
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition
Code
#14
MSNet-R50En (8+16 ensemble, ImageNet pretrained)
83.8
Top 5 Accuracy
· Extra Data
· 2020-07-20
MotionSqueeze: Neural Motion Feature Learning for Video Understanding
Code
#15
SELFYNet-TSM-R50 (16 frames, ImageNet pretrained)
82.9
Top 5 Accuracy
· Extra Data
· 2021-02-14
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition
Code
#16
RSANet-R50 (8+16 frames, ImageNet pretrained, 2 clips)
82.8
Top 5 Accuracy
· 2021-11-02
Relational Self-Attention: What's Missing in Attention for Video Understanding
Code
#17
PAN ResNet101 (RGB only, no Flow)
82.8
Top 5 Accuracy
· 2020-08-08
PAN: Towards Fast Action Recognition via Learning Persistence of Appearance
Code
#18
RSANet-R50 (8+16 frames, ImageNet pretrained, a single clip)
82.6
Top 5 Accuracy
· 2021-11-02
Relational Self-Attention: What's Missing in Attention for Video Understanding
Code
#19
VoV3D-L (32frames, Kinetics pretrained, single)
82.3
Top 5 Accuracy
· Extra Data
· 2020-12-01
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
Code
#20
MSNet-R50 (16 frames, ImageNet pretrained)
82.3
Top 5 Accuracy
· Extra Data
· 2020-07-20
MotionSqueeze: Neural Motion Feature Learning for Video Understanding
Code
#21
RNL+TSM Ensemble(R50+R101, ImageNet pretrained)
82.2
Top 5 Accuracy
· 2020-07-17
Region-based Non-local Operation for Video Classification
Code
#22
RNL+TSM Ensemble(ResNet50, ImageNet pretrained)
81.5
Top 5 Accuracy
· 2020-07-17
Region-based Non-local Operation for Video Classification
Code
#23
TSM+W3 (16 frames, ResNet50)
81.3
Top 5 Accuracy
· 2020-04-02
Knowing What, Where and When to Look: Efficient Video Action Modeling with Attention
#24
RSANet-R50 (16 frames, ImageNet pretrained, a single clip)
81.1
Top 5 Accuracy
· 2021-11-02
Relational Self-Attention: What's Missing in Attention for Video Understanding
Code
#25
VoV3D-M (32frames, Kinetics pretrained, single)
80.43
Top 5 Accuracy
· Extra Data
· 2020-12-01
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
Code
#26
MSNet-R50 (8 frames, ImageNet pretrained)
80.3
Top 5 Accuracy
· 2020-07-20
MotionSqueeze: Neural Motion Feature Learning for Video Understanding
Code
#27
RSANet-R50 (8 frames, ImageNet pretrained, a single clip)
79.6
Top 5 Accuracy
· 2021-11-02
Relational Self-Attention: What's Missing in Attention for Video Understanding
Code
#28
VoV3D-L (32frames, from scratch, single)
78.7
Top 5 Accuracy
· 2020-12-01
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
Code
#29
S3D-G (ImageNet pretrained)
SOTA
78.7
Top 5 Accuracy
· Extra Data
· 2017-12-13
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
Code
#30
TSMEn
78.5
Top 5 Accuracy
· 2018-11-20
TSM: Temporal Shift Module for Efficient Video Understanding
Code
#31
S3D
78.1
Top 5 Accuracy
· 2017-12-13
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
Code
#32
VoV3D-M (32frames, from scratch, single)
78
Top 5 Accuracy
· 2020-12-01
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
Code
#33
VoV3D-L (16frames, from scratch, single)
78
Top 5 Accuracy
· 2020-12-01
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
Code
#34
TSM
77.1
Top 5 Accuracy
· 2018-11-20
TSM: Temporal Shift Module for Efficient Video Understanding
Code
#35
VoV3D-M (16frames, from scratch, single)
76.9
Top 5 Accuracy
· 2020-12-01
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
Code