TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Time Series/Action Recognition/UCF101

Action Recognition on UCF101

Metric: 3-fold Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕3-fold Accuracy▼Extra DataPaperDate↕Code
1FTP-UniFormerV2-L/1499.7NoEnhancing Video Transformers for Action Understa...2024-03-24-
2VideoMAE V2-g99.6YesVideoMAE V2: Scaling Video Masked Autoencoders w...2023-03-29Code
3OmniVec99.6YesOmniVec: Learning robust representations with cr...2023-11-07-
4OmniVec299.6Yes---
5VideoMAE V2-g99.6YesVideoMAE V2: Scaling Video Masked Autoencoders w...2023-03-29Code
6BIKE98.8YesBidirectional Cross-Modal Knowledge Exploration ...2022-12-31Code
7SMART98.64NoSMART Frame Selection for Action Recognition2020-12-19-
8OmniSource (SlowOnly-8x8-R101-RGB + I3D-Flow)98.6YesOmni-sourced Webly-supervised Learning for Video...2020-03-29Code
9PERF-Net (multi-distilled S3D)98.6NoPERF-Net: Pose Empowered RGB-Flow Net2020-09-28-
10ZeroI2V ViT-L/1498.6YesZeroI2V: Zero-Cost Adaptation of Pre-trained Tra...2023-10-02Code
11LGD-3D Two-stream98.2NoLearning Spatio-Temporal Representation with Loc...2019-06-13-
12Text4Vis98.2NoRevisiting Classifier: Transferring Vision-Langu...2022-07-04Code
13Two-Stream I3D (Imagenet+Kinetics pre-training)98NoQuo Vadis, Action Recognition? A New Model and t...2017-05-22Code
14MARS+RGB+Flow (64 frames, Kinetics pretrained)97.8Yes--Code
15HATNet (32 frames)97.8NoLarge Scale Holistic Video Understanding2019-04-25Code
16Two-Stream I3D (Kinetics pre-training)97.8NoQuo Vadis, Action Recognition? A New Model and t...2017-05-22Code
17BubbleNET97.62Yes---
18D3D + D3D97.6NoD3D: Distilled 3D Networks for Video Action Reco...2018-12-19Code
19BQN97.6NoBusy-Quiet Video Disentangling for Video Classif...2021-03-29Code
20MVD (ViT-B)97.5NoMasked Video Distillation: Rethinking Masked Fea...2022-12-08Code
21CCS + TSN (ImageNet+Kinetics pretrained)97.4YesCooperative Cross-Stream Network for Discriminat...2019-08-27-
22R[2+1]D-TwoStream (Kinetics pretrained)97.3YesA Closer Look at Spatiotemporal Convolutions for...2017-11-30Code
23SSL-KD (R21D-18)97.3NoA Large-Scale Analysis on Self-Supervised Video ...2023-06-09-
24Multi-stream I3D 97.2No---
25CA2ST(B/16)97.2NoCA^2ST: Cross-Attention in Audio, Space, and Tim...2025-03-30-
26Hidden Two-Stream97.1NoHidden Two-Stream Convolutional Networks for Act...2017-04-02Code
27D3D (Kinetics-600 pretraining)97.1YesD3D: Distilled 3D Networks for Video Action Reco...2018-12-19Code
28AMD(ViT-B/16)97.1YesAsymmetric Masked Distillation for Pre-Training ...2023-11-06-
29D3D (Kinetics-400 pretraining)97YesD3D: Distilled 3D Networks for Video Action Reco...2018-12-19Code
30LGD-3D RGB97NoLearning Spatio-Temporal Representation with Loc...2019-06-13-
31STAM-32 (ImageNet/Kinetics pretraining)97YesAn Image is Worth 16x16 Words, What is a Video W...2021-03-25Code
32FASTER3296.9NoFASTER Recurrent Networks for Efficient Video Cl...2019-06-10-
33R[2+1]D-RGB (Kinetics pretrained)96.8YesA Closer Look at Spatiotemporal Convolutions for...2017-11-30Code
34S3D-G (ImageNet, Kinetics-400 pretrained)96.8YesRethinking Spatiotemporal Feature Learning: Spee...2017-12-13Code
35LGD-3D Flow96.8NoLearning Spatio-Temporal Representation with Loc...2019-06-13-
36Flow-I3D (Imagenet+Kinetics pre-training)96.7YesQuo Vadis, Action Recognition? A New Model and t...2017-05-22Code
37VidTr-L96.7NoVidTr: Video Transformer Without Convolutions2021-04-23-
38CMA iter1-S96.5NoTwo-Stream Video Classification with Cross-Modal...2019-08-01-
39Flow-I3D (Kinetics pre-training)96.5YesQuo Vadis, Action Recognition? A New Model and t...2017-05-22Code
40I3D RGB + DMC-Net (I3D)96.5NoDMC-Net: Generating Discriminative Motion Cues f...2019-01-11-
41M3Video96.5NoMasked Motion Encoding for Self-Supervised Video...2022-10-12Code
42A2-Net (ResNet-50)96.4No$A^2$-Nets: Double Attention Networks2018-10-27-
43pBYOL96.3NoA Large-Scale Study on Unsupervised Spatiotempor...2021-04-29Code
44STM (ImageNet+Kinetics pretrain)96.2NoSTM: SpatioTemporal and Motion Encoding for Acti...2019-08-07-
45VideoMAE96.1NoVideoMAE: Masked Autoencoders are Data-Efficient...2022-03-23Code
46MF-Net, RGB only (ImageNet+Kinetics pretrained)96YesMulti-Fiber Networks for Video Recognition2018-07-30-
47Optical Flow Guided Feature96NoOptical Flow Guided Feature: A Fast and Robust M...2017-11-29Code
48MARS+RGB+Flow (16 frames)95.8No--Code
49Prob-Distill95.7NoAttention Distillation for Learning Video Repres...2019-04-05-
50RGB-I3D (Imagenet+Kinetics pre-training)95.6YesQuo Vadis, Action Recognition? A New Model and t...2017-05-22Code
51R[2+1]D-Flow (Kinetics pretrained)95.5YesA Closer Look at Spatiotemporal Convolutions for...2017-11-30Code
52TVNet+IDT95.4NoEnd-to-End Learning of Motion Representation for...2018-04-02Code
53SCE (R3D-50)95.3NoSimilarity Contrastive Estimation for Image and ...2022-12-21Code
54TesNet (ImageNet pretrained)95.2YesLearning spatio-temporal representations with te...2020-02-11-
55MMV TSM-50x295.2NoSelf-Supervised MultiModal Versatile Networks2020-06-29Code
56I3D-LSTM95.1No--Code
57RGB-I3D (Kinetics pre-training)95.1YesQuo Vadis, Action Recognition? A New Model and t...2017-05-22Code
58R[2+1]D-TwoStream (Sports-1M pretrained)95YesA Closer Look at Spatiotemporal Convolutions for...2017-11-30Code
59X3D MobileNet-V3 LGD-GC94.85YesLIGAR: Lightweight General-purpose Action Recogn...2021-08-30Code
60ST-ResNet + IDT94.6NoSpatiotemporal Residual Networks for Video Actio...2016-11-07Code
61ResNeXt-101 (64f)94.5NoCan Spatiotemporal 3D CNNs Retrace the History o...2017-11-27Code
62R-STAN-10194.5No---
63TSN+TSM94.3NoTemporal-Spatial Mapping for Action Recognition2018-09-11-
64ARTNet w/ TSN94.3NoAppearance-and-Relation Networks for Video Class...2017-11-24Code
65Temporal Segment Networks94.2NoTemporal Segment Networks: Towards Good Practice...2016-08-02Code
66TS-LSTM94.1NoTS-LSTM and Temporal-Inception: Exploiting Spati...2017-03-30Code
67XKD (ViT-B/112/16)94.1NoXKD: Cross-modal Knowledge Distillation with Dom...2022-11-25Code
68CVRL (R3D-152 2x; K600)93.9NoSpatiotemporal Contrastive Video Representation ...2020-08-09Code
69SVT93.7NoSelf-supervised Video Transformer2021-12-02Code
70RSPNet93.7NoRSPNet: Relative Speed Perception for Unsupervis...2020-10-27Code
71R[2+1]D-RGB (Sports-1M pretrained)93.6YesA Closer Look at Spatiotemporal Convolutions for...2017-11-30Code
72Two-stream I3D93.4NoQuo Vadis, Action Recognition? A New Model and t...2017-05-22Code
73CVRL (R3D-50; K600)93.4NoSpatiotemporal Contrastive Video Representation ...2020-08-09Code
74VideoMS (ViT-B)93.4NoEVEREST: Efficient Masked Video Autoencoder by R...2022-11-19Code
75XKD-Modality-Agnostic (ViT-B/112/16)93.4NoXKD: Cross-modal Knowledge Distillation with Dom...2022-11-25Code
76R[2+1]D-Flow (Sports-1M pretrained)93.3YesA Closer Look at Spatiotemporal Convolutions for...2017-11-30Code
77BraVe:V-FA (TSM-50x2)93.1NoBroaden Your Views for Self-Supervised Video Lea...2021-03-30Code
78VIMPAC92.7NoVIMPAC: Video Pre-Training via Masked Token Pred...2021-06-21Code
79S:VGG-16, T:VGG-16 (ImageNet pretrain)92.5YesConvolutional Two-Stream Network Fusion for Vide...2016-04-22Code
80CrissCross (AudioSet)92.4NoSelf-Supervised Audio-Visual Representation Lear...2021-11-09Code
81DMC-Net (I3D)92.3NoDMC-Net: Generating Discriminative Motion Cues f...2019-01-11-
82CVRL (R3D-50; K400)92.2NoSpatiotemporal Contrastive Video Representation ...2020-08-09Code
83two-in-one two stream92NoDance with Flow: Two-in-One Stream Action Detect...2019-04-01Code
84LTC91.7NoLong-term Temporal Convolutions for Action Recog...2016-04-15Code
85R-STAN-5091.5No---
86TDD + IDT91.5NoAction Recognition with Trajectory-Pooled Deep-C...2015-05-19Code
87AVID+CMA (Modified R2+1D-18 on Audioset)91.5NoAudio-Visual Instance Discrimination with Cross-...2020-04-27Code
88CrissCross (Kinetics400)91.5NoSelf-Supervised Audio-Visual Representation Lear...2021-11-09Code
89Very deep two-stream ConvNet91.4NoTowards Good Practices for Very Deep Two-Stream ...2015-07-08Code
90VideoMAE(no extra data)91.3NoVideoMAE: Masked Autoencoders are Data-Efficient...2022-03-23Code
913D ResNeXt-101 + Confidence Distillation91.2NoEfficient Action Recognition Using Confidence Di...2021-09-05-
92MR Two-Sream R-CNN91.1No---
93AVID (Modified R2+1D-18 on Audioset)91NoAudio-Visual Instance Discrimination with Cross-...2020-04-27Code
94ViCC (S3D; R+F)90.5NoSelf-supervised Video Representation Learning wi...2021-06-18Code
95Dynamic Image Networks + IDT89.1No--Code
96ViCC (S3D; RGB)88.8NoSelf-supervised Video Representation Learning wi...2021-06-18Code
97ViCC (R2+1D; R+F)88.8NoSelf-supervised Video Representation Learning wi...2021-06-18Code
98Two-stream+LSTM88.6NoBeyond Short Snippets: Deep Networks for Video C...2015-03-31Code
99P3D (ImageNet + Sports1M)88.6YesLearning Spatio-Temporal Representation with Pse...2017-11-28Code
100CrissCross (Kinetics-Sound)88.3NoSelf-Supervised Audio-Visual Representation Lear...2021-11-09Code
101Two-Stream (ImageNet pretrained)88YesTwo-Stream Convolutional Networks for Action Rec...2014-06-09Code
102AVID+CMA (Modified R2+1D-18 on Kinetics)87.5NoAudio-Visual Instance Discrimination with Cross-...2020-04-27Code
103AVID (Modified R2+1D-18 on Kinetics)86.9NoAudio-Visual Instance Discrimination with Cross-...2020-04-27Code
104MV-CNN86.4NoReal-time Action Recognition with Enhanced Motio...2016-04-26Code
105Dynamics 2 for DenseNet-201 Transformer86.1NoVideo Action Recognition Collaborative Learning ...2023-02-17Code
106R(2+1)D-18 (DistInit pretraining)85.8NoDistInit: Learning Video Representations Without...2019-01-26-
107Res3D85.8NoConvNet Architecture Search for Spatiotemporal F...2017-08-16Code
108MCN (R3D-18; RGB)85.4NoSelf-Supervised Video Representation Learning wi...2021-08-19-
109MCN (R2+1D; RGB)84.8NoSelf-Supervised Video Representation Learning wi...2021-08-19-
110ActionFlowNet83.9NoActionFlowNet: Learning Motion Representation fo...2016-12-09-
111ViCC (R2+1D; RGB)82.8NoSelf-supervised Video Representation Learning wi...2021-06-18Code
112TCLR (R3D-18)82.4NoTCLR: Temporal Contrastive Learning for Video Re...2021-01-20Code
113C3D82.3NoLearning Spatiotemporal Features with 3D Convolu...2014-12-02Code
114PCL (ResNet-18)82.3NoPretext-Contrastive Learning: Toward Good Practi...2020-10-29Code
115HalluciNet (ResNet-50)79.83NoHalluciNet-ing Spatiotemporal Representations Us...2019-12-10Code
116R[2+1]D (VideoMoCo)78.7NoVideoMoCo: Contrastive Video Representation Lear...2021-03-10Code
117DPC (Modified 3D Resnet-34)75.7NoVideo Representation Learning by Dense Predictiv...2019-09-10Code
1183D-SqueezeNet74.94NoResource Efficient 3D Convolutional Neural Netwo...2019-04-04Code
119CoCLR74.5NoSelf-supervised Co-training for Video Representa...2020-10-19Code
120IIC (R3D)74.4NoSelf-supervised Video Representation Learning Us...2020-08-06Code
1213D-ResNet-18 (VideoMoCo)74.1NoVideoMoCo: Contrastive Video Representation Lear...2021-03-10Code
122ViCC (S3D; RGB)72.2NoSelf-supervised Video Representation Learning wi...2021-06-18Code
123TCE (ResNet-50)71.2NoTemporally Coherent Embeddings for Self-Supervis...2020-03-21Code
124TCE (ResNet-18, Split 1)68.8NoTemporally Coherent Embeddings for Self-Supervis...2020-03-21Code
125DPC (3D ResNet-18)68.2NoVideo Representation Learning by Dense Predictiv...2019-09-10Code
126TCE (ResNet18, Split 1)68.2NoTemporally Coherent Embeddings for Self-Supervis...2020-03-21Code
127VCP (R3D)66NoVideo Cloze Procedure for Self-Supervised Spatio...2020-01-02Code
1283D Cubic Puzzles (3D ResNet-18)65.8NoSelf-Supervised Video Representation Learning wi...2018-11-24-
129Slow Fusion + Finetune top 3 layers65.4Yes--Code
130Video Clip Ordering (R3D)64.9No---
131Skip-Clip (3D ResNet-18)64.4NoSkip-Clip: Self-Supervised Spatiotemporal Repres...2019-10-28-
132MLGCN63.27No---
1333D RotNet (3D ResNet-18)62.9NoSelf-Supervised Spatiotemporal Feature Learning ...2018-11-28-
134DPC (3D ResNet-18, Split 1)60.6NoVideo Representation Learning by Dense Predictiv...2019-09-10Code
135O3N (AlexNet)60.3NoSelf-Supervised Video Representation Learning Wi...2016-11-21-
136Contrastive Multiview Coding (CaffeNet x2)59.1NoContrastive Multiview Coding2019-06-13Code
137Motion & Appearance (C3D)58.8NoSelf-supervised Spatio-temporal Representation L...2019-04-07Code
1383D-ShuffleNetV2 0.25x56.52NoResource Efficient 3D Convolutional Neural Netwo...2019-04-04Code
1393D-MobileNetV2 0.2x55.56NoResource Efficient 3D Convolutional Neural Netwo...2019-04-04Code
140Arrow of Time (AlexNet)55.3No---
141VideoGan (C3D)52.1NoGenerating Videos with Scene Dynamics2016-09-08-
142Shuffle and Learn (AlexNet)50.9NoShuffle and Learn: Unsupervised Learning using T...2016-03-28-
143Baseline UCF10143.9NoUCF101: A Dataset of 101 Human Actions Classes F...2012-12-03Code
144CD-UAR42.5NoTowards Universal Representation for Unseen Acti...2018-03-22-
145SL35.2No---
146I3D + PoTion29.3No---