TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Video/Charades

Video on Charades

Metric: MAP (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕MAP▼Extra DataPaperDate↕Code
1TokenLearner66.3NoTokenLearner: What Can 8 Learned Tokens Do for I...2021-06-21Code
2TubeViT-L66.2NoRethinking Video ViTs: Sparse Video Tubes for Jo...2022-12-06Code
3MoViNet-A663.2NoMoViNets: Mobile Video Networks for Efficient Vi...2021-03-21Code
4DEEP-HAL with ODF+SDF (AssembleNet++)62.29NoSelf-supervising Action Recognition by Statistic...2020-01-14-
5AssembleNet++ 5059.8NoAssembleNet++: Assembling Modality Representatio...2020-08-18Code
6AssembleNet58.6YesAssembleNet: Searching for Multi-Stream Neural C...2019-05-30Code
7AssembleNet-10158.6NoAssembleNet: Searching for Multi-Stream Neural C...2019-05-30Code
8VicTR (ViT-L/14)57.6NoVicTR: Video-conditioned Text Representations fo...2023-04-05-
9AssembleNet++ 50 without object54.98NoAssembleNet++: Assembling Modality Representatio...2020-08-18Code
10BIKE50.7NoBidirectional Cross-Modal Knowledge Exploration ...2022-12-31Code
11DEEP-HAL with ODF+SDF (I3D)50.16NoSelf-supervising Action Recognition by Statistic...2020-01-14-
12MoViNet-A448.5NoMoViNets: Mobile Video Networks for Efficient Vi...2021-03-21Code
13AdaFocus (weak supervision, MViT-B-24, 32x3)47.8NoTowards Weakly Supervised End-to-end Learning fo...2023-11-28-
14MViT-B-24, 32x3 (Kinetics-600 pretraining)47.7NoMultiscale Vision Transformers2021-04-22Code
15En-VidTr-L47.3NoVidTr: Video Transformer Without Convolutions2021-04-23-
16MViT-B, 32x3 (Kinetics-600 pretraining)47.1NoMultiscale Vision Transformers2021-04-22Code
17MViT-B-24, 32x3 (Kinetics-400 pretraining)46.3NoMultiscale Vision Transformers2021-04-22Code
18SlowFast (Kinetics-600 pretraining, NL)45.2NoSlowFast Networks for Video Recognition2018-12-10Code
19MViT-B, 32x3 (Kinetics-400 pretraining)44.3NoMultiscale Vision Transformers2021-04-22Code
20ActionCLIP (ViT-B/16)44.3NoActionCLIP: A New Paradigm for Video Action Reco...2021-09-17Code
21MViT-B, 16x4 (Kinetics-600 pretraining)43.9NoMultiscale Vision Transformers2021-04-22Code
22VidTr-L43.5NoVidTr: Video Transformer Without Convolutions2021-04-23-
23JMRN + R101-NL-LFB43.23NoPose And Joint-Aware Action Recognition2020-10-16Code
24HAF+BoW/FV/OFF halluc. +MSK×8/PN43.1NoHallucinating IDT Descriptors and I3D Optical Fl...2019-06-13-
25LFB42.5YesLong-Term Feature Banks for Detailed Video Under...2018-12-12Code
26SlowFast (Kinetics-400 pretraining, NL)42.5NoSlowFast Networks for Video Recognition2018-12-10Code
27SlowFast (Kinetics-600 pretraining)42.1NoSlowFast Networks for Video Recognition2018-12-10Code
28AdaFocus (weak supervision, MViT-B-K400-pretrain, 16x4)41.4NoTowards Weakly Supervised End-to-end Learning fo...2023-11-28-
29AdaFocus (weak supervision, X3D-L, 32x3)41.2NoTowards Weakly Supervised End-to-end Learning fo...2023-11-28-
30Timeception (R3D)41.1NoTimeception for Complex Action Recognition2018-12-04Code
31PA3D + (GCN + I3D + NL I3D)41No---
32PoTion + (GCN + I3D + NL I3D)40.8No---
33MViT-B, 16x4 (Kinetics-400 pretraining)40NoMultiscale Vision Transformers2021-04-22Code
34STRG39.7YesVideos as Space-Time Region Graphs2018-06-05-
35AdaFocus (weak supervision, Slowfast-R50, 16x8)39.3NoTowards Weakly Supervised End-to-end Learning fo...2023-11-28-
36STLT + I3D38.5NoRevisiting spatio-temporal layouts for compositi...2021-11-02Code
37EvaNet38.1YesEvolving Space-Time Neural Architectures for Vid...2018-11-26-
38Timeception (I3D)37.2NoTimeception for Complex Action Recognition2018-12-04Code
39I3D32.9NoQuo Vadis, Action Recognition? A New Model and t...2017-05-22Code
40MoViNet-A232.5NoMoViNets: Mobile Video Networks for Efficient Vi...2021-03-21Code
41Timeception (R2D)31.6NoTimeception for Complex Action Recognition2018-12-04Code
42MultiScale TRN25.2YesTemporal Relational Reasoning in Videos2017-11-22Code
43Co Slow_6425.2NoContinual 3D Convolutional Neural Networks for R...2021-05-31Code
44Slow-8×824.1NoContinual 3D Convolutional Neural Networks for R...2021-05-31Code
45Asyn-TF22.4YesAsynchronous Temporal Fields for Action Recognit...2016-12-19Code
46CoViAR21.9YesCompressed Video Action Recognition2017-12-02Code
47Co Slow_821.5NoContinual 3D Convolutional Neural Networks for R...2021-05-31Code
482-Strm18.6NoTwo-Stream Convolutional Networks for Action Rec...2014-06-09Code
49JMRN (Pose only)16.2NoPose And Joint-Aware Action Recognition2020-10-16Code