Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Time Series
/
Action Recognition
/
Something-Something V1
Action Recognition on Something-Something V1
Metric: Top 1 Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Top 1 Accuracy (best first)
Top 1 Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Top 1 Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
InternVideo
70
Yes
InternVideo: General Video Foundation Models via...
2022-12-06
Code
2
VideoMAE V2-g
68.7
Yes
VideoMAE V2: Scaling Video Masked Autoencoders w...
2023-03-29
Code
3
Side4Video (EVA ViT-E/14
67.3
No
Side4Video: Spatial-Temporal Side Network for Me...
2023-11-27
Code
4
ATM
65.6
No
What Can Simple Arithmetic Operations Do for Tem...
2023-07-18
Code
5
TAdaFormer-L/14
63.7
Yes
Temporally-Adaptive Models for Efficient Video U...
2023-08-10
Code
6
TDS-CLIP-ViT-L/14(8frames)
63
No
TDS-CLIP: Temporal Difference Side Network for I...
2024-08-20
Code
7
UniFormerV2-L
62.7
Yes
-
-
Code
8
StructVit-B-4-1
61.3
No
Learning Correlation Structures for Vision Trans...
2024-04-05
-
9
UniFormer-B (IN-1K + Kinetics400)
60.9
No
-
-
Code
10
TAdaConvNeXtV2-B
60.7
Yes
Temporally-Adaptive Models for Efficient Video U...
2023-08-10
Code
11
TPS
58.3
No
Spatiotemporal Self-attention Modeling with Temp...
2022-07-27
Code
12
MSMA (8+16frames)
57.9
No
-
-
-
13
UniFormer-B (IN-1K + Kinetics600)
57.6
No
-
-
Code
14
SIFA
57.3
No
Stand-Alone Inter-Frame Attention in Video Models
2022-06-14
Code
15
EAN ResNet50 (single clip, center crop,8+16 ensemble, with sparse Transformer)
57.2
No
EAN: Event Adaptive Network for Enhanced Action ...
2021-07-22
Code
16
TCM (Ensemble)
57.2
No
Motion-driven Visual Tempo Learning for Video-ba...
2022-02-24
Code
17
BQNEn (ImageNet + K400 pretrained)
57.1
No
Busy-Quiet Video Disentangling for Video Classif...
2021-03-29
Code
18
TDN ResNet101 (one clip, center crop, 8+16 ensemble, ImageNet pretrained, RGB only)
56.8
No
TDN: Temporal Difference Networks for Efficient ...
2020-12-18
Code
19
SELFYNet-TSM-R50En (8+16 frames, ImageNet pretrained, 2 clips)
56.6
Yes
Learning Self-Similarity in Space and Time as Ge...
2021-02-14
Code
20
CT-Net Ensemble (R50, 8+12+16+24)
56.6
No
CT-Net: Channel Tensorization Network for Video ...
2021-06-03
Code
21
MoDS (8+16frames)
56.6
No
-
-
-
22
MLP-3D
56.5
No
MLP-3D: A MLP-like 3D Architecture with Grouped ...
2022-06-13
-
23
RSANet-R50 (8+16 frames, ImageNet pretrained, 2 clips)
56.1
No
Relational Self-Attention: What's Missing in Att...
2021-11-02
Code
24
SELFYNet-TSM-R50En (8+16 frames, ImageNet pretrained, a single clip)
55.8
Yes
Learning Self-Similarity in Space and Time as Ge...
2021-02-14
Code
25
RSANet-R50 (8+16 frames, ImageNet pretrained, a single clip)
55.5
No
Relational Self-Attention: What's Missing in Att...
2021-11-02
Code
26
PAN ResNet101 (RGB only, no Flow)
55.3
No
PAN: Towards Fast Action Recognition via Learnin...
2020-08-08
Code
27
GSM Ensemble InceptionV3 (ImageNet pretrained)
55.16
Yes
Gate-Shift Networks for Video Action Recognition
2019-12-01
Code
28
MSNet-R50En (ensemble)
55.1
Yes
MotionSqueeze: Neural Motion Feature Learning fo...
2020-07-20
Code
29
AE-Net (8+16frames)
55
No
-
-
-
30
VoV3D-L (32frames, Kinetics pretrained, single)
54.59
Yes
Diverse Temporal Aggregation and Depthwise Spati...
2020-12-01
Code
31
MSNet-R50En (8+16 ensemble, ImageNet pretrained)
54.4
Yes
MotionSqueeze: Neural Motion Feature Learning fo...
2020-07-20
Code
32
SELFYNet-TSM-R50 (16 frames, ImageNet pretrained)
54.3
Yes
Learning Self-Similarity in Space and Time as Ge...
2021-02-14
Code
33
RNL+TSM Ensemble(R50+R101, ImageNet pretrained)
54.1
No
Region-based Non-local Operation for Video Class...
2020-07-17
Code
34
RSANet-R50 (16 frames, ImageNet pretrained, a single clip)
54
No
Relational Self-Attention: What's Missing in Att...
2021-11-02
Code
35
MVFNet-R50EN
54
No
MVFNet: Multi-View Fusion Network for Efficient ...
2020-12-13
Code
36
STPG (8+16frames)
53.5
No
-
-
-
37
GB + DF + LB (ResNet152, ImageNet pretrained)
53.4
Yes
Action recognition with spatial-temporal discrim...
2019-08-20
-
38
ip-CSN-152 (IG-65M pretraining)
53.3
No
Video Classification with Channel-Separated Conv...
2019-04-04
Code
39
MARS+RGB+Flow (64 frames, Kinetics pretrained)
53
Yes
-
-
Code
40
RNL+TSM Ensemble(ResNet50, ImageNet pretrained)
52.7
No
Region-based Non-local Operation for Video Class...
2020-07-17
Code
41
VoV3D-M (32frames, Kinetics pretrained, single)
52.68
Yes
Diverse Temporal Aggregation and Depthwise Spati...
2020-12-01
Code
42
TSM+W3 (16 frames, ResNet50)
52.6
No
Knowing What, Where and When to Look: Efficient ...
2020-04-02
-
43
AK-Net
52.5
No
Action Keypoint Network for Efficient Video Reco...
2022-01-17
-
44
MSNet-R50 (16 frames, ImageNet pretrained)
52.1
Yes
MotionSqueeze: Neural Motion Feature Learning fo...
2020-07-20
Code
45
ir-CSN-152 (IG-65M pretraining)
52.1
No
Video Classification with Channel-Separated Conv...
2019-04-04
Code
46
RSANet-R50 (8 frames, ImageNet pretrained, a single clip)
51.9
No
Relational Self-Attention: What's Missing in Att...
2021-11-02
Code
47
GSM InceptionV3 (16 frames, ImageNet pretrained)
51.68
Yes
Gate-Shift Networks for Video Action Recognition
2019-12-01
Code
48
R(2+1)D-152 (IG-65M pretraining)
51.6
No
Video Classification with Channel-Separated Conv...
2019-04-04
Code
49
MSNet-R50 (8 frames, ImageNet pretrained)
50.9
No
MotionSqueeze: Neural Motion Feature Learning fo...
2020-07-20
Code
50
TSM (RGB + Flow)
50.7
No
TSM: Temporal Shift Module for Efficient Video U...
2018-11-20
Code
51
STM (16 frames, ImageNet pretraining)
50.7
No
STM: SpatioTemporal and Motion Encoding for Acti...
2019-08-07
-
52
VoV3D-L (32frames, from scratch, single)
50.6
No
Diverse Temporal Aggregation and Depthwise Spati...
2020-12-01
Code
53
ResNet50 I3D (Moments pretrained)
50
Yes
Moments in Time Dataset: one million videos for ...
2018-01-09
Code
54
VoV3D-M (32frames, from scratch, single)
49.8
No
Diverse Temporal Aggregation and Depthwise Spati...
2020-12-01
Code
55
TSMEn
49.7
No
TSM: Temporal Shift Module for Efficient Video U...
2018-11-20
Code
56
TRG (Inception-V3)
49.7
No
Temporal Reasoning Graph for Activity Recognition
2019-08-27
-
57
TRG (ResNet-50)
49.5
No
Temporal Reasoning Graph for Activity Recognition
2019-08-27
-
58
VoV3D-L (16frames, from scratch, single)
49.5
No
Diverse Temporal Aggregation and Depthwise Spati...
2020-12-01
Code
59
ir-CSN-152
49.3
No
Video Classification with Channel-Separated Conv...
2019-04-04
Code
60
RSTG (Kinetics pretrained)
49.2
Yes
Recurrent Space-time Graph Neural Networks
2019-04-11
Code
61
ResNet50 I3D (Kinetics pretrained)
48.6
Yes
Moments in Time Dataset: one million videos for ...
2018-01-09
Code
62
ir-CSN-101
48.4
No
Video Classification with Channel-Separated Conv...
2019-04-04
Code
63
S3D-G (ImageNet pretrained)
48.2
Yes
Rethinking Spatiotemporal Feature Learning: Spee...
2017-12-13
Code
64
VoV3D-M (16frames, from scratch, single)
48.1
No
Diverse Temporal Aggregation and Depthwise Spati...
2020-12-01
Code
65
S3D
47.3
No
Rethinking Spatiotemporal Feature Learning: Spee...
2017-12-13
Code
66
TSM
47.2
No
TSM: Temporal Shift Module for Efficient Video U...
2018-11-20
Code
67
ECO-Net (ImageNet pretrained)
46.4
Yes
ECO: Efficient Convolutional Network for Online ...
2018-04-24
Code
68
ECO-Net
46.4
No
ECO: Efficient Convolutional Network for Online ...
2018-04-24
Code
69
NL I3D + GCN
46.1
No
Videos as Space-Time Region Graphs
2018-06-05
-
70
NL I3D
44.4
No
Non-local Neural Networks
2017-11-21
Code
71
Motion Feature Net
43.9
No
Motion Feature Network: Fixed Motion Filter for ...
2018-07-26
-
72
Motion Feature Net
43.9
No
Motion Feature Network: Fixed Motion Filter for ...
2018-07-26
-
73
2-Stream TRN
42.01
No
Temporal Relational Reasoning in Videos
2017-11-22
Code
74
2-Stream TRN
42.01
No
Temporal Relational Reasoning in Videos
2017-11-22
Code
75
HF-TSN (ImageNet pretraining)
41.97
Yes
Hierarchical Feature Aggregation Networks for Vi...
2019-05-29
-
76
MARS+RGB+Flow (16 frames, Kinetics pretrained)
40.4
No
-
-
Code
77
M-TRN
34.4
No
Temporal Relational Reasoning in Videos
2017-11-22
Code
#1
InternVideo
SOTA
70
Top 1 Accuracy
· Extra Data
· 2022-12-06
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Code
#2
VideoMAE V2-g
68.7
Top 1 Accuracy
· Extra Data
· 2023-03-29
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Code
#3
Side4Video (EVA ViT-E/14
67.3
Top 1 Accuracy
· 2023-11-27
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Code
#4
ATM
65.6
Top 1 Accuracy
· 2023-07-18
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Code
#5
TAdaFormer-L/14
63.7
Top 1 Accuracy
· Extra Data
· 2023-08-10
Temporally-Adaptive Models for Efficient Video Understanding
Code
#6
TDS-CLIP-ViT-L/14(8frames)
63
Top 1 Accuracy
· 2024-08-20
TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning
Code
#7
UniFormerV2-L
62.7
Top 1 Accuracy
· Extra Data
No paper
Code
#8
StructVit-B-4-1
61.3
Top 1 Accuracy
· 2024-04-05
Learning Correlation Structures for Vision Transformers
#9
UniFormer-B (IN-1K + Kinetics400)
60.9
Top 1 Accuracy
No paper
Code
#10
TAdaConvNeXtV2-B
60.7
Top 1 Accuracy
· Extra Data
· 2023-08-10
Temporally-Adaptive Models for Efficient Video Understanding
Code
#11
TPS
SOTA
58.3
Top 1 Accuracy
· 2022-07-27
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
Code
#12
MSMA (8+16frames)
57.9
Top 1 Accuracy
No paper
#13
UniFormer-B (IN-1K + Kinetics600)
57.6
Top 1 Accuracy
No paper
Code
#14
SIFA
SOTA
57.3
Top 1 Accuracy
· 2022-06-14
Stand-Alone Inter-Frame Attention in Video Models
Code
#15
EAN ResNet50 (single clip, center crop,8+16 ensemble, with sparse Transformer)
SOTA
57.2
Top 1 Accuracy
· 2021-07-22
EAN: Event Adaptive Network for Enhanced Action Recognition
Code
#16
TCM (Ensemble)
57.2
Top 1 Accuracy
· 2022-02-24
Motion-driven Visual Tempo Learning for Video-based Action Recognition
Code
#17
BQNEn (ImageNet + K400 pretrained)
SOTA
57.1
Top 1 Accuracy
· 2021-03-29
Busy-Quiet Video Disentangling for Video Classification
Code
#18
TDN ResNet101 (one clip, center crop, 8+16 ensemble, ImageNet pretrained, RGB only)
SOTA
56.8
Top 1 Accuracy
· 2020-12-18
TDN: Temporal Difference Networks for Efficient Action Recognition
Code
#19
SELFYNet-TSM-R50En (8+16 frames, ImageNet pretrained, 2 clips)
56.6
Top 1 Accuracy
· Extra Data
· 2021-02-14
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition
Code
#20
CT-Net Ensemble (R50, 8+12+16+24)
56.6
Top 1 Accuracy
· 2021-06-03
CT-Net: Channel Tensorization Network for Video Classification
Code
#21
MoDS (8+16frames)
56.6
Top 1 Accuracy
No paper
#22
MLP-3D
56.5
Top 1 Accuracy
· 2022-06-13
MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing
#23
RSANet-R50 (8+16 frames, ImageNet pretrained, 2 clips)
56.1
Top 1 Accuracy
· 2021-11-02
Relational Self-Attention: What's Missing in Attention for Video Understanding
Code
#24
SELFYNet-TSM-R50En (8+16 frames, ImageNet pretrained, a single clip)
55.8
Top 1 Accuracy
· Extra Data
· 2021-02-14
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition
Code
#25
RSANet-R50 (8+16 frames, ImageNet pretrained, a single clip)
55.5
Top 1 Accuracy
· 2021-11-02
Relational Self-Attention: What's Missing in Attention for Video Understanding
Code
#26
PAN ResNet101 (RGB only, no Flow)
SOTA
55.3
Top 1 Accuracy
· 2020-08-08
PAN: Towards Fast Action Recognition via Learning Persistence of Appearance
Code
#27
GSM Ensemble InceptionV3 (ImageNet pretrained)
SOTA
55.16
Top 1 Accuracy
· Extra Data
· 2019-12-01
Gate-Shift Networks for Video Action Recognition
Code
#28
MSNet-R50En (ensemble)
55.1
Top 1 Accuracy
· Extra Data
· 2020-07-20
MotionSqueeze: Neural Motion Feature Learning for Video Understanding
Code
#29
AE-Net (8+16frames)
55
Top 1 Accuracy
No paper
#30
VoV3D-L (32frames, Kinetics pretrained, single)
54.59
Top 1 Accuracy
· Extra Data
· 2020-12-01
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
Code
#31
MSNet-R50En (8+16 ensemble, ImageNet pretrained)
54.4
Top 1 Accuracy
· Extra Data
· 2020-07-20
MotionSqueeze: Neural Motion Feature Learning for Video Understanding
Code
#32
SELFYNet-TSM-R50 (16 frames, ImageNet pretrained)
54.3
Top 1 Accuracy
· Extra Data
· 2021-02-14
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition
Code
#33
RNL+TSM Ensemble(R50+R101, ImageNet pretrained)
54.1
Top 1 Accuracy
· 2020-07-17
Region-based Non-local Operation for Video Classification
Code
#34
RSANet-R50 (16 frames, ImageNet pretrained, a single clip)
54
Top 1 Accuracy
· 2021-11-02
Relational Self-Attention: What's Missing in Attention for Video Understanding
Code
#35
MVFNet-R50EN
54
Top 1 Accuracy
· 2020-12-13
MVFNet: Multi-View Fusion Network for Efficient Video Recognition
Code
#36
STPG (8+16frames)
53.5
Top 1 Accuracy
No paper
#37
GB + DF + LB (ResNet152, ImageNet pretrained)
SOTA
53.4
Top 1 Accuracy
· Extra Data
· 2019-08-20
Action recognition with spatial-temporal discriminative filter banks
#38
ip-CSN-152 (IG-65M pretraining)
SOTA
53.3
Top 1 Accuracy
· 2019-04-04
Video Classification with Channel-Separated Convolutional Networks
Code
#39
MARS+RGB+Flow (64 frames, Kinetics pretrained)
53
Top 1 Accuracy
· Extra Data
No paper
Code
#40
RNL+TSM Ensemble(ResNet50, ImageNet pretrained)
52.7
Top 1 Accuracy
· 2020-07-17
Region-based Non-local Operation for Video Classification
Code
#41
VoV3D-M (32frames, Kinetics pretrained, single)
52.68
Top 1 Accuracy
· Extra Data
· 2020-12-01
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
Code
#42
TSM+W3 (16 frames, ResNet50)
52.6
Top 1 Accuracy
· 2020-04-02
Knowing What, Where and When to Look: Efficient Video Action Modeling with Attention
#43
AK-Net
52.5
Top 1 Accuracy
· 2022-01-17
Action Keypoint Network for Efficient Video Recognition
#44
MSNet-R50 (16 frames, ImageNet pretrained)
52.1
Top 1 Accuracy
· Extra Data
· 2020-07-20
MotionSqueeze: Neural Motion Feature Learning for Video Understanding
Code
#45
ir-CSN-152 (IG-65M pretraining)
52.1
Top 1 Accuracy
· 2019-04-04
Video Classification with Channel-Separated Convolutional Networks
Code
#46
RSANet-R50 (8 frames, ImageNet pretrained, a single clip)
51.9
Top 1 Accuracy
· 2021-11-02
Relational Self-Attention: What's Missing in Attention for Video Understanding
Code
#47
GSM InceptionV3 (16 frames, ImageNet pretrained)
51.68
Top 1 Accuracy
· Extra Data
· 2019-12-01
Gate-Shift Networks for Video Action Recognition
Code
#48
R(2+1)D-152 (IG-65M pretraining)
51.6
Top 1 Accuracy
· 2019-04-04
Video Classification with Channel-Separated Convolutional Networks
Code
#49
MSNet-R50 (8 frames, ImageNet pretrained)
50.9
Top 1 Accuracy
· 2020-07-20
MotionSqueeze: Neural Motion Feature Learning for Video Understanding
Code
#50
TSM (RGB + Flow)
SOTA
50.7
Top 1 Accuracy
· 2018-11-20
TSM: Temporal Shift Module for Efficient Video Understanding
Code
#51
STM (16 frames, ImageNet pretraining)
50.7
Top 1 Accuracy
· 2019-08-07
STM: SpatioTemporal and Motion Encoding for Action Recognition
#52
VoV3D-L (32frames, from scratch, single)
50.6
Top 1 Accuracy
· 2020-12-01
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
Code
#53
ResNet50 I3D (Moments pretrained)
SOTA
50
Top 1 Accuracy
· Extra Data
· 2018-01-09
Moments in Time Dataset: one million videos for event understanding
Code
#54
VoV3D-M (32frames, from scratch, single)
49.8
Top 1 Accuracy
· 2020-12-01
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
Code
#55
TSMEn
49.7
Top 1 Accuracy
· 2018-11-20
TSM: Temporal Shift Module for Efficient Video Understanding
Code
#56
TRG (Inception-V3)
49.7
Top 1 Accuracy
· 2019-08-27
Temporal Reasoning Graph for Activity Recognition
#57
TRG (ResNet-50)
49.5
Top 1 Accuracy
· 2019-08-27
Temporal Reasoning Graph for Activity Recognition
#58
VoV3D-L (16frames, from scratch, single)
49.5
Top 1 Accuracy
· 2020-12-01
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
Code
#59
ir-CSN-152
49.3
Top 1 Accuracy
· 2019-04-04
Video Classification with Channel-Separated Convolutional Networks
Code
#60
RSTG (Kinetics pretrained)
49.2
Top 1 Accuracy
· Extra Data
· 2019-04-11
Recurrent Space-time Graph Neural Networks
Code
#61
ResNet50 I3D (Kinetics pretrained)
48.6
Top 1 Accuracy
· Extra Data
· 2018-01-09
Moments in Time Dataset: one million videos for event understanding
Code
#62
ir-CSN-101
48.4
Top 1 Accuracy
· 2019-04-04
Video Classification with Channel-Separated Convolutional Networks
Code
#63
S3D-G (ImageNet pretrained)
SOTA
48.2
Top 1 Accuracy
· Extra Data
· 2017-12-13
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
Code
#64
VoV3D-M (16frames, from scratch, single)
48.1
Top 1 Accuracy
· 2020-12-01
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
Code
#65
S3D
47.3
Top 1 Accuracy
· 2017-12-13
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
Code
#66
TSM
47.2
Top 1 Accuracy
· 2018-11-20
TSM: Temporal Shift Module for Efficient Video Understanding
Code
#67
ECO-Net (ImageNet pretrained)
46.4
Top 1 Accuracy
· Extra Data
· 2018-04-24
ECO: Efficient Convolutional Network for Online Video Understanding
Code
#68
ECO-Net
46.4
Top 1 Accuracy
· 2018-04-24
ECO: Efficient Convolutional Network for Online Video Understanding
Code
#69
NL I3D + GCN
46.1
Top 1 Accuracy
· 2018-06-05
Videos as Space-Time Region Graphs
#70
NL I3D
SOTA
44.4
Top 1 Accuracy
· 2017-11-21
Non-local Neural Networks
Code
#71
Motion Feature Net
43.9
Top 1 Accuracy
· 2018-07-26
Motion Feature Network: Fixed Motion Filter for Action Recognition
#72
Motion Feature Net
43.9
Top 1 Accuracy
· 2018-07-26
Motion Feature Network: Fixed Motion Filter for Action Recognition
#73
2-Stream TRN
42.01
Top 1 Accuracy
· 2017-11-22
Temporal Relational Reasoning in Videos
Code
#74
2-Stream TRN
42.01
Top 1 Accuracy
· 2017-11-22
Temporal Relational Reasoning in Videos
Code
#75
HF-TSN (ImageNet pretraining)
41.97
Top 1 Accuracy
· Extra Data
· 2019-05-29
Hierarchical Feature Aggregation Networks for Video Action Recognition
#76
MARS+RGB+Flow (16 frames, Kinetics pretrained)
40.4
Top 1 Accuracy
No paper
Code
#77
M-TRN
34.4
Top 1 Accuracy
· 2017-11-22
Temporal Relational Reasoning in Videos
Code