TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Robots/Activity Recognition/Something-Something V2

Activity Recognition on Something-Something V2

Metric: Parameters (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Parameters▼Extra DataPaperDate↕Code
1InternVideo2-6B2131YesInternVideo2: Scaling Foundation Models for Mult...2024-03-22Code
2VideoMAE V2-g1013YesVideoMAE V2: Scaling Video Masked Autoencoders w...2023-03-29Code
3MVD (Kinetics400 pretrain, ViT-H, 16 frame)633YesMasked Video Distillation: Rethinking Masked Fea...2022-12-08Code
4MAR (50% mask, ViT-L, 16x4)311NoMAR: Masked Autoencoders for Efficient Action Re...2022-07-24Code
5MAR (75% mask, ViT-L, 16x4)311NoMAR: Masked Autoencoders for Efficient Action Re...2022-07-24Code
6MVD (Kinetics400 pretrain, ViT-L, 16 frame)305YesMasked Video Distillation: Rethinking Masked Fea...2022-12-08Code
7VideoMAE (no extra data, ViT-L, 32x2)305NoVideoMAE: Masked Autoencoders are Data-Efficient...2022-03-23Code
8VideoMAE (no extra data, ViT-L, 16frame)305NoVideoMAE: Masked Autoencoders are Data-Efficient...2022-03-23Code
9MaskFeat (Kinetics600 pretrain, MViT-L)218YesMasked Feature Prediction for Self-Supervised Vi...2021-12-16Code
10MViTv2-L (IN-21K + Kinetics400 pretrain)213.1NoMViTv2: Improved Multiscale Vision Transformers ...2021-12-02Code
11MAR (50% mask, ViT-B, 16x4)94NoMAR: Masked Autoencoders for Efficient Action Re...2022-07-24Code
12MAR (75% mask, ViT-B, 16x4)94NoMAR: Masked Autoencoders for Efficient Action Re...2022-07-24Code
13BEVT (IN-1K + Kinetics400 pretrain)89YesBEVT: BERT Pretraining of Video Transformers2021-12-02Code
14Swin-B (IN-21K + Kinetics400 pretrain)89YesVideo Swin Transformer2021-06-24Code
15MVD (Kinetics400 pretrain, ViT-B, 16 frame)87YesMasked Video Distillation: Rethinking Masked Fea...2022-12-08Code
16AMD(ViT-B/16)87NoAsymmetric Masked Distillation for Pre-Training ...2023-11-06-
17VideoMAE (no extra data, ViT-B, 16frame)87NoVideoMAE: Masked Autoencoders are Data-Efficient...2022-03-23Code
18CT-Net Ensemble (R50, 8+12+16+24)83.8YesCT-Net: Channel Tensorization Network for Video ...2021-06-03Code
19MorphMLP-B (IN-1K)68.5YesMorphMLP: An Efficient MLP-Like Backbone for Spa...2021-11-24Code
20MViTv2-B (IN-21K + Kinetics400 pretrain)51.1NoMViTv2: Improved Multiscale Vision Transformers ...2021-12-02Code
21UniFormer-B (IN-1K + Kinetics400 pretrain)50.1Yes--Code
22MViT-B, 32x3(Kinetics600 pretrain)36.6YesMultiscale Vision Transformers2021-04-22Code
23GC-TDN Ensemble (R50,8+16)27.4YesGroup Contextualization for Video Recognition2022-03-18Code
24MVD (Kinetics400 pretrain, ViT-S, 16 frame)22YesMasked Video Distillation: Rethinking Masked Fea...2022-12-08Code
25AMD(ViT-S/16)22NoAsymmetric Masked Distillation for Pre-Training ...2023-11-06-
26UniFormer-S (IN-1K + Kinetics600 pretrain)21.4Yes--Code