TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Time Series/Action Recognition/Something-Something V1

Action Recognition on Something-Something V1

Metric: Top 1 Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Top 1 Accuracy▼Extra DataPaperDate↕Code
1InternVideo70YesInternVideo: General Video Foundation Models via...2022-12-06Code
2VideoMAE V2-g68.7YesVideoMAE V2: Scaling Video Masked Autoencoders w...2023-03-29Code
3Side4Video (EVA ViT-E/1467.3NoSide4Video: Spatial-Temporal Side Network for Me...2023-11-27Code
4ATM65.6NoWhat Can Simple Arithmetic Operations Do for Tem...2023-07-18Code
5TAdaFormer-L/1463.7YesTemporally-Adaptive Models for Efficient Video U...2023-08-10Code
6TDS-CLIP-ViT-L/14(8frames)63NoTDS-CLIP: Temporal Difference Side Network for I...2024-08-20Code
7UniFormerV2-L62.7Yes--Code
8StructVit-B-4-161.3NoLearning Correlation Structures for Vision Trans...2024-04-05-
9UniFormer-B (IN-1K + Kinetics400)60.9No--Code
10TAdaConvNeXtV2-B60.7YesTemporally-Adaptive Models for Efficient Video U...2023-08-10Code
11TPS58.3NoSpatiotemporal Self-attention Modeling with Temp...2022-07-27Code
12MSMA (8+16frames)57.9No---
13UniFormer-B (IN-1K + Kinetics600)57.6No--Code
14SIFA57.3NoStand-Alone Inter-Frame Attention in Video Models2022-06-14Code
15EAN ResNet50 (single clip, center crop,8+16 ensemble, with sparse Transformer)57.2NoEAN: Event Adaptive Network for Enhanced Action ...2021-07-22Code
16TCM (Ensemble)57.2NoMotion-driven Visual Tempo Learning for Video-ba...2022-02-24Code
17BQNEn (ImageNet + K400 pretrained)57.1NoBusy-Quiet Video Disentangling for Video Classif...2021-03-29Code
18TDN ResNet101 (one clip, center crop, 8+16 ensemble, ImageNet pretrained, RGB only)56.8NoTDN: Temporal Difference Networks for Efficient ...2020-12-18Code
19SELFYNet-TSM-R50En (8+16 frames, ImageNet pretrained, 2 clips)56.6YesLearning Self-Similarity in Space and Time as Ge...2021-02-14Code
20CT-Net Ensemble (R50, 8+12+16+24)56.6NoCT-Net: Channel Tensorization Network for Video ...2021-06-03Code
21MoDS (8+16frames)56.6No---
22MLP-3D56.5NoMLP-3D: A MLP-like 3D Architecture with Grouped ...2022-06-13-
23RSANet-R50 (8+16 frames, ImageNet pretrained, 2 clips)56.1NoRelational Self-Attention: What's Missing in Att...2021-11-02Code
24SELFYNet-TSM-R50En (8+16 frames, ImageNet pretrained, a single clip)55.8YesLearning Self-Similarity in Space and Time as Ge...2021-02-14Code
25RSANet-R50 (8+16 frames, ImageNet pretrained, a single clip)55.5NoRelational Self-Attention: What's Missing in Att...2021-11-02Code
26PAN ResNet101 (RGB only, no Flow)55.3NoPAN: Towards Fast Action Recognition via Learnin...2020-08-08Code
27GSM Ensemble InceptionV3 (ImageNet pretrained)55.16YesGate-Shift Networks for Video Action Recognition2019-12-01Code
28MSNet-R50En (ensemble)55.1YesMotionSqueeze: Neural Motion Feature Learning fo...2020-07-20Code
29AE-Net (8+16frames)55No---
30VoV3D-L (32frames, Kinetics pretrained, single)54.59YesDiverse Temporal Aggregation and Depthwise Spati...2020-12-01Code
31MSNet-R50En (8+16 ensemble, ImageNet pretrained)54.4YesMotionSqueeze: Neural Motion Feature Learning fo...2020-07-20Code
32SELFYNet-TSM-R50 (16 frames, ImageNet pretrained)54.3YesLearning Self-Similarity in Space and Time as Ge...2021-02-14Code
33RNL+TSM Ensemble(R50+R101, ImageNet pretrained)54.1NoRegion-based Non-local Operation for Video Class...2020-07-17Code
34RSANet-R50 (16 frames, ImageNet pretrained, a single clip)54NoRelational Self-Attention: What's Missing in Att...2021-11-02Code
35MVFNet-R50EN54NoMVFNet: Multi-View Fusion Network for Efficient ...2020-12-13Code
36STPG (8+16frames)53.5No---
37GB + DF + LB (ResNet152, ImageNet pretrained)53.4YesAction recognition with spatial-temporal discrim...2019-08-20-
38ip-CSN-152 (IG-65M pretraining)53.3NoVideo Classification with Channel-Separated Conv...2019-04-04Code
39MARS+RGB+Flow (64 frames, Kinetics pretrained)53Yes--Code
40RNL+TSM Ensemble(ResNet50, ImageNet pretrained)52.7NoRegion-based Non-local Operation for Video Class...2020-07-17Code
41VoV3D-M (32frames, Kinetics pretrained, single)52.68YesDiverse Temporal Aggregation and Depthwise Spati...2020-12-01Code
42TSM+W3 (16 frames, ResNet50)52.6NoKnowing What, Where and When to Look: Efficient ...2020-04-02-
43AK-Net52.5NoAction Keypoint Network for Efficient Video Reco...2022-01-17-
44MSNet-R50 (16 frames, ImageNet pretrained)52.1YesMotionSqueeze: Neural Motion Feature Learning fo...2020-07-20Code
45ir-CSN-152 (IG-65M pretraining)52.1NoVideo Classification with Channel-Separated Conv...2019-04-04Code
46RSANet-R50 (8 frames, ImageNet pretrained, a single clip)51.9NoRelational Self-Attention: What's Missing in Att...2021-11-02Code
47GSM InceptionV3 (16 frames, ImageNet pretrained)51.68YesGate-Shift Networks for Video Action Recognition2019-12-01Code
48R(2+1)D-152 (IG-65M pretraining)51.6NoVideo Classification with Channel-Separated Conv...2019-04-04Code
49MSNet-R50 (8 frames, ImageNet pretrained)50.9NoMotionSqueeze: Neural Motion Feature Learning fo...2020-07-20Code
50TSM (RGB + Flow)50.7NoTSM: Temporal Shift Module for Efficient Video U...2018-11-20Code
51STM (16 frames, ImageNet pretraining)50.7NoSTM: SpatioTemporal and Motion Encoding for Acti...2019-08-07-
52VoV3D-L (32frames, from scratch, single)50.6NoDiverse Temporal Aggregation and Depthwise Spati...2020-12-01Code
53ResNet50 I3D (Moments pretrained)50YesMoments in Time Dataset: one million videos for ...2018-01-09Code
54VoV3D-M (32frames, from scratch, single)49.8NoDiverse Temporal Aggregation and Depthwise Spati...2020-12-01Code
55TSMEn49.7NoTSM: Temporal Shift Module for Efficient Video U...2018-11-20Code
56TRG (Inception-V3)49.7NoTemporal Reasoning Graph for Activity Recognition2019-08-27-
57TRG (ResNet-50)49.5NoTemporal Reasoning Graph for Activity Recognition2019-08-27-
58VoV3D-L (16frames, from scratch, single)49.5NoDiverse Temporal Aggregation and Depthwise Spati...2020-12-01Code
59ir-CSN-15249.3NoVideo Classification with Channel-Separated Conv...2019-04-04Code
60RSTG (Kinetics pretrained)49.2YesRecurrent Space-time Graph Neural Networks2019-04-11Code
61ResNet50 I3D (Kinetics pretrained)48.6YesMoments in Time Dataset: one million videos for ...2018-01-09Code
62ir-CSN-10148.4NoVideo Classification with Channel-Separated Conv...2019-04-04Code
63S3D-G (ImageNet pretrained)48.2YesRethinking Spatiotemporal Feature Learning: Spee...2017-12-13Code
64VoV3D-M (16frames, from scratch, single)48.1NoDiverse Temporal Aggregation and Depthwise Spati...2020-12-01Code
65S3D47.3NoRethinking Spatiotemporal Feature Learning: Spee...2017-12-13Code
66TSM47.2NoTSM: Temporal Shift Module for Efficient Video U...2018-11-20Code
67ECO-Net (ImageNet pretrained)46.4YesECO: Efficient Convolutional Network for Online ...2018-04-24Code
68ECO-Net46.4NoECO: Efficient Convolutional Network for Online ...2018-04-24Code
69NL I3D + GCN46.1NoVideos as Space-Time Region Graphs2018-06-05-
70NL I3D44.4NoNon-local Neural Networks2017-11-21Code
71Motion Feature Net43.9NoMotion Feature Network: Fixed Motion Filter for ...2018-07-26-
72Motion Feature Net43.9NoMotion Feature Network: Fixed Motion Filter for ...2018-07-26-
732-Stream TRN42.01NoTemporal Relational Reasoning in Videos2017-11-22Code
742-Stream TRN42.01NoTemporal Relational Reasoning in Videos2017-11-22Code
75HF-TSN (ImageNet pretraining)41.97YesHierarchical Feature Aggregation Networks for Vi...2019-05-29-
76MARS+RGB+Flow (16 frames, Kinetics pretrained)40.4No--Code
77M-TRN34.4NoTemporal Relational Reasoning in Videos2017-11-22Code