TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Robots/Activity Recognition/AVA v2.2

Activity Recognition on AVA v2.2

Metric: mAP (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕mAP▼Extra DataPaperDate↕Code
1LART (Hiera-H, K700 PT+FT)45.1YesOn the Benefits of 3D Pose and Tracking for Huma...2023-04-03Code
2Hiera-H (K700 PT+FT)43.3YesHiera: A Hierarchical Vision Transformer without...2023-06-01Code
3VideoMAE V2-g42.6YesVideoMAE V2: Scaling Video Masked Autoencoders w...2023-03-29Code
4STAR/L41.7YesEnd-to-End Spatio-Temporal Action Localisation w...2023-04-24-
5MVD (Kinetics400 pretrain+finetune, ViT-H, 16x4)41.1YesMasked Video Distillation: Rethinking Masked Fea...2022-12-08Code
6InternVideo41.01YesInternVideo: General Video Foundation Models via...2022-12-06Code
7MVD (Kinetics400 pretrain, ViT-H, 16x4)40.1YesMasked Video Distillation: Rethinking Masked Fea...2022-12-08Code
8MaskFeat (Kinetics-600 pretrain, MViT-L)39.8YesMasked Feature Prediction for Self-Supervised Vi...2021-12-16Code
9UMT-L (ViT-L/16)39.8YesUnmasked Teacher: Towards Training-Efficient Vid...2023-03-28Code
10VideoMAE (K400 pretrain+finetune, ViT-H, 16x4)39.5YesVideoMAE: Masked Autoencoders are Data-Efficient...2022-03-23Code
11VideoMAE (K700 pretrain+finetune, ViT-L, 16x4)39.3YesVideoMAE: Masked Autoencoders are Data-Efficient...2022-03-23Code
12MVD (Kinetics400 pretrain+finetune, ViT-L, 16x4)38.7YesMasked Video Distillation: Rethinking Masked Fea...2022-12-08Code
13VideoMAE (K400 pretrain+finetune, ViT-L, 16x4)37.8YesVideoMAE: Masked Autoencoders are Data-Efficient...2022-03-23Code
14MVD (Kinetics400 pretrain, ViT-L, 16x4)37.7YesMasked Video Distillation: Rethinking Masked Fea...2022-12-08Code
15VideoMAE (K400 pretrain, ViT-H, 16x4)36.5YesVideoMAE: Masked Autoencoders are Data-Efficient...2022-03-23Code
16VideoMAE (K700 pretrain, ViT-L, 16x4)36.1YesVideoMAE: Masked Autoencoders are Data-Efficient...2022-03-23Code
17MeMViT-2435.4YesMeMViT: Memory-Augmented Multiscale Vision Trans...2022-01-20Code
18MViTv2-L (IN21k, K700)34.4YesMViTv2: Improved Multiscale Vision Transformers ...2021-12-02Code
19VideoMAE (K400 pretrain, ViT-L, 16x4)34.3YesVideoMAE: Masked Autoencoders are Data-Efficient...2022-03-23Code
20MVD (Kinetics400 pretrain+finetune, ViT-B, 16x4)34.2YesMasked Video Distillation: Rethinking Masked Fea...2022-12-08Code
21AMD(ViT-B/16)33.5YesAsymmetric Masked Distillation for Pre-Training ...2023-11-06-
22HIT32.6NoHolistic Interaction Transformer Network for Act...2022-10-23Code
23VideoMAE (K400 pretrain+finetune, ViT-B, 16x4)31.8YesVideoMAE: Masked Autoencoders are Data-Efficient...2022-03-23Code
24ACAR-Net, SlowFast R-101 (Kinetics-700 pretraining)31.72YesActor-Context-Actor Relation Network for Spatio-...2020-06-14Code
25MVD (Kinetics400 pretrain, ViT-B, 16x4)31.1YesMasked Video Distillation: Rethinking Masked Fea...2022-12-08Code
26Object Transformer31NoTowards Long-Form Video Understanding2021-06-21Code
27MViT-B-24, 32x3 (Kinetics-600 pretraining)28.7NoMultiscale Vision Transformers2021-04-22Code
28MViT-B, 32x3 (Kinetics-500 pretraining)27.5NoMultiscale Vision Transformers2021-04-22Code
29SlowFast, 16x8 R101+NL (Kinetics-600 pretraining)27.5NoSlowFast Networks for Video Recognition2018-12-10Code
30MViT-B, 64x3 (Kinetics-400 pretraining)27.3NoMultiscale Vision Transformers2021-04-22Code
31SlowFast, 8x8 R101+NL (Kinetics-600 pretraining)27.1NoSlowFast Networks for Video Recognition2018-12-10Code
32MViT-B, 32x3 (Kinetics-400 pretraining)26.8NoMultiscale Vision Transformers2021-04-22Code
33VideoMAE (K400 pretrain, ViT-B, 16x4)26.7YesVideoMAE: Masked Autoencoders are Data-Efficient...2022-03-23Code
34ORViT MViT-B, 16x4 (K400 pretraining)26.6NoObject-Region Video Transformers2021-10-13Code
35MViT-B, 16x4 (Kinetics-600 pretraining)26.1NoMultiscale Vision Transformers2021-04-22Code
36MViT-B, 16x4 (Kinetics-400 pretraining)24.5NoMultiscale Vision Transformers2021-04-22Code
37SlowFast, 8x8, R101 (Kinetics-400 pretraining)23.8NoSlowFast Networks for Video Recognition2018-12-10Code
38SlowFast, 4x16, R50 (Kinetics-400 pretraining)21.9NoSlowFast Networks for Video Recognition2018-12-10Code