Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Robots
/
Activity Recognition
/
Something-Something V2
Activity Recognition on Something-Something V2
Metric: Parameters (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Parameters (best first)
Parameters (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Parameters
▼
Extra Data
Paper
Date
↕
Code
1
InternVideo2-6B
2131
Yes
InternVideo2: Scaling Foundation Models for Mult...
2024-03-22
Code
2
VideoMAE V2-g
1013
Yes
VideoMAE V2: Scaling Video Masked Autoencoders w...
2023-03-29
Code
3
MVD (Kinetics400 pretrain, ViT-H, 16 frame)
633
Yes
Masked Video Distillation: Rethinking Masked Fea...
2022-12-08
Code
4
MAR (50% mask, ViT-L, 16x4)
311
No
MAR: Masked Autoencoders for Efficient Action Re...
2022-07-24
Code
5
MAR (75% mask, ViT-L, 16x4)
311
No
MAR: Masked Autoencoders for Efficient Action Re...
2022-07-24
Code
6
MVD (Kinetics400 pretrain, ViT-L, 16 frame)
305
Yes
Masked Video Distillation: Rethinking Masked Fea...
2022-12-08
Code
7
VideoMAE (no extra data, ViT-L, 32x2)
305
No
VideoMAE: Masked Autoencoders are Data-Efficient...
2022-03-23
Code
8
VideoMAE (no extra data, ViT-L, 16frame)
305
No
VideoMAE: Masked Autoencoders are Data-Efficient...
2022-03-23
Code
9
MaskFeat (Kinetics600 pretrain, MViT-L)
218
Yes
Masked Feature Prediction for Self-Supervised Vi...
2021-12-16
Code
10
MViTv2-L (IN-21K + Kinetics400 pretrain)
213.1
No
MViTv2: Improved Multiscale Vision Transformers ...
2021-12-02
Code
11
MAR (50% mask, ViT-B, 16x4)
94
No
MAR: Masked Autoencoders for Efficient Action Re...
2022-07-24
Code
12
MAR (75% mask, ViT-B, 16x4)
94
No
MAR: Masked Autoencoders for Efficient Action Re...
2022-07-24
Code
13
BEVT (IN-1K + Kinetics400 pretrain)
89
Yes
BEVT: BERT Pretraining of Video Transformers
2021-12-02
Code
14
Swin-B (IN-21K + Kinetics400 pretrain)
89
Yes
Video Swin Transformer
2021-06-24
Code
15
MVD (Kinetics400 pretrain, ViT-B, 16 frame)
87
Yes
Masked Video Distillation: Rethinking Masked Fea...
2022-12-08
Code
16
AMD(ViT-B/16)
87
No
Asymmetric Masked Distillation for Pre-Training ...
2023-11-06
-
17
VideoMAE (no extra data, ViT-B, 16frame)
87
No
VideoMAE: Masked Autoencoders are Data-Efficient...
2022-03-23
Code
18
CT-Net Ensemble (R50, 8+12+16+24)
83.8
Yes
CT-Net: Channel Tensorization Network for Video ...
2021-06-03
Code
19
MorphMLP-B (IN-1K)
68.5
Yes
MorphMLP: An Efficient MLP-Like Backbone for Spa...
2021-11-24
Code
20
MViTv2-B (IN-21K + Kinetics400 pretrain)
51.1
No
MViTv2: Improved Multiscale Vision Transformers ...
2021-12-02
Code
21
UniFormer-B (IN-1K + Kinetics400 pretrain)
50.1
Yes
-
-
Code
22
MViT-B, 32x3(Kinetics600 pretrain)
36.6
Yes
Multiscale Vision Transformers
2021-04-22
Code
23
GC-TDN Ensemble (R50,8+16)
27.4
Yes
Group Contextualization for Video Recognition
2022-03-18
Code
24
MVD (Kinetics400 pretrain, ViT-S, 16 frame)
22
Yes
Masked Video Distillation: Rethinking Masked Fea...
2022-12-08
Code
25
AMD(ViT-S/16)
22
No
Asymmetric Masked Distillation for Pre-Training ...
2023-11-06
-
26
UniFormer-S (IN-1K + Kinetics600 pretrain)
21.4
Yes
-
-
Code
#1
InternVideo2-6B
SOTA
2131
Parameters
· Extra Data
· 2024-03-22
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
Code
#2
VideoMAE V2-g
SOTA
1013
Parameters
· Extra Data
· 2023-03-29
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Code
#3
MVD (Kinetics400 pretrain, ViT-H, 16 frame)
SOTA
633
Parameters
· Extra Data
· 2022-12-08
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Code
#4
MAR (50% mask, ViT-L, 16x4)
SOTA
311
Parameters
· 2022-07-24
MAR: Masked Autoencoders for Efficient Action Recognition
Code
#5
MAR (75% mask, ViT-L, 16x4)
311
Parameters
· 2022-07-24
MAR: Masked Autoencoders for Efficient Action Recognition
Code
#6
MVD (Kinetics400 pretrain, ViT-L, 16 frame)
305
Parameters
· Extra Data
· 2022-12-08
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Code
#7
VideoMAE (no extra data, ViT-L, 32x2)
SOTA
305
Parameters
· 2022-03-23
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Code
#8
VideoMAE (no extra data, ViT-L, 16frame)
305
Parameters
· 2022-03-23
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Code
#9
MaskFeat (Kinetics600 pretrain, MViT-L)
SOTA
218
Parameters
· Extra Data
· 2021-12-16
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Code
#10
MViTv2-L (IN-21K + Kinetics400 pretrain)
SOTA
213.1
Parameters
· 2021-12-02
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Code
#11
MAR (50% mask, ViT-B, 16x4)
94
Parameters
· 2022-07-24
MAR: Masked Autoencoders for Efficient Action Recognition
Code
#12
MAR (75% mask, ViT-B, 16x4)
94
Parameters
· 2022-07-24
MAR: Masked Autoencoders for Efficient Action Recognition
Code
#13
BEVT (IN-1K + Kinetics400 pretrain)
89
Parameters
· Extra Data
· 2021-12-02
BEVT: BERT Pretraining of Video Transformers
Code
#14
Swin-B (IN-21K + Kinetics400 pretrain)
SOTA
89
Parameters
· Extra Data
· 2021-06-24
Video Swin Transformer
Code
#15
MVD (Kinetics400 pretrain, ViT-B, 16 frame)
87
Parameters
· Extra Data
· 2022-12-08
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Code
#16
AMD(ViT-B/16)
87
Parameters
· 2023-11-06
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
#17
VideoMAE (no extra data, ViT-B, 16frame)
87
Parameters
· 2022-03-23
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Code
#18
CT-Net Ensemble (R50, 8+12+16+24)
SOTA
83.8
Parameters
· Extra Data
· 2021-06-03
CT-Net: Channel Tensorization Network for Video Classification
Code
#19
MorphMLP-B (IN-1K)
68.5
Parameters
· Extra Data
· 2021-11-24
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning
Code
#20
MViTv2-B (IN-21K + Kinetics400 pretrain)
51.1
Parameters
· 2021-12-02
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Code
#21
UniFormer-B (IN-1K + Kinetics400 pretrain)
50.1
Parameters
· Extra Data
No paper
Code
#22
MViT-B, 32x3(Kinetics600 pretrain)
SOTA
36.6
Parameters
· Extra Data
· 2021-04-22
Multiscale Vision Transformers
Code
#23
GC-TDN Ensemble (R50,8+16)
27.4
Parameters
· Extra Data
· 2022-03-18
Group Contextualization for Video Recognition
Code
#24
MVD (Kinetics400 pretrain, ViT-S, 16 frame)
22
Parameters
· Extra Data
· 2022-12-08
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Code
#25
AMD(ViT-S/16)
22
Parameters
· 2023-11-06
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
#26
UniFormer-S (IN-1K + Kinetics600 pretrain)
21.4
Parameters
· Extra Data
No paper
Code