Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Video
/
Charades
Video on Charades
Metric: MAP (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
MAP (best first)
MAP (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
MAP
▼
Extra Data
Paper
Date
↕
Code
1
TokenLearner
66.3
No
TokenLearner: What Can 8 Learned Tokens Do for I...
2021-06-21
Code
2
TubeViT-L
66.2
No
Rethinking Video ViTs: Sparse Video Tubes for Jo...
2022-12-06
Code
3
MoViNet-A6
63.2
No
MoViNets: Mobile Video Networks for Efficient Vi...
2021-03-21
Code
4
DEEP-HAL with ODF+SDF (AssembleNet++)
62.29
No
Self-supervising Action Recognition by Statistic...
2020-01-14
-
5
AssembleNet++ 50
59.8
No
AssembleNet++: Assembling Modality Representatio...
2020-08-18
Code
6
AssembleNet
58.6
Yes
AssembleNet: Searching for Multi-Stream Neural C...
2019-05-30
Code
7
AssembleNet-101
58.6
No
AssembleNet: Searching for Multi-Stream Neural C...
2019-05-30
Code
8
VicTR (ViT-L/14)
57.6
No
VicTR: Video-conditioned Text Representations fo...
2023-04-05
-
9
AssembleNet++ 50 without object
54.98
No
AssembleNet++: Assembling Modality Representatio...
2020-08-18
Code
10
BIKE
50.7
No
Bidirectional Cross-Modal Knowledge Exploration ...
2022-12-31
Code
11
DEEP-HAL with ODF+SDF (I3D)
50.16
No
Self-supervising Action Recognition by Statistic...
2020-01-14
-
12
MoViNet-A4
48.5
No
MoViNets: Mobile Video Networks for Efficient Vi...
2021-03-21
Code
13
AdaFocus (weak supervision, MViT-B-24, 32x3)
47.8
No
Towards Weakly Supervised End-to-end Learning fo...
2023-11-28
-
14
MViT-B-24, 32x3 (Kinetics-600 pretraining)
47.7
No
Multiscale Vision Transformers
2021-04-22
Code
15
En-VidTr-L
47.3
No
VidTr: Video Transformer Without Convolutions
2021-04-23
-
16
MViT-B, 32x3 (Kinetics-600 pretraining)
47.1
No
Multiscale Vision Transformers
2021-04-22
Code
17
MViT-B-24, 32x3 (Kinetics-400 pretraining)
46.3
No
Multiscale Vision Transformers
2021-04-22
Code
18
SlowFast (Kinetics-600 pretraining, NL)
45.2
No
SlowFast Networks for Video Recognition
2018-12-10
Code
19
MViT-B, 32x3 (Kinetics-400 pretraining)
44.3
No
Multiscale Vision Transformers
2021-04-22
Code
20
ActionCLIP (ViT-B/16)
44.3
No
ActionCLIP: A New Paradigm for Video Action Reco...
2021-09-17
Code
21
MViT-B, 16x4 (Kinetics-600 pretraining)
43.9
No
Multiscale Vision Transformers
2021-04-22
Code
22
VidTr-L
43.5
No
VidTr: Video Transformer Without Convolutions
2021-04-23
-
23
JMRN + R101-NL-LFB
43.23
No
Pose And Joint-Aware Action Recognition
2020-10-16
Code
24
HAF+BoW/FV/OFF halluc. +MSK×8/PN
43.1
No
Hallucinating IDT Descriptors and I3D Optical Fl...
2019-06-13
-
25
LFB
42.5
Yes
Long-Term Feature Banks for Detailed Video Under...
2018-12-12
Code
26
SlowFast (Kinetics-400 pretraining, NL)
42.5
No
SlowFast Networks for Video Recognition
2018-12-10
Code
27
SlowFast (Kinetics-600 pretraining)
42.1
No
SlowFast Networks for Video Recognition
2018-12-10
Code
28
AdaFocus (weak supervision, MViT-B-K400-pretrain, 16x4)
41.4
No
Towards Weakly Supervised End-to-end Learning fo...
2023-11-28
-
29
AdaFocus (weak supervision, X3D-L, 32x3)
41.2
No
Towards Weakly Supervised End-to-end Learning fo...
2023-11-28
-
30
Timeception (R3D)
41.1
No
Timeception for Complex Action Recognition
2018-12-04
Code
31
PA3D + (GCN + I3D + NL I3D)
41
No
-
-
-
32
PoTion + (GCN + I3D + NL I3D)
40.8
No
-
-
-
33
MViT-B, 16x4 (Kinetics-400 pretraining)
40
No
Multiscale Vision Transformers
2021-04-22
Code
34
STRG
39.7
Yes
Videos as Space-Time Region Graphs
2018-06-05
-
35
AdaFocus (weak supervision, Slowfast-R50, 16x8)
39.3
No
Towards Weakly Supervised End-to-end Learning fo...
2023-11-28
-
36
STLT + I3D
38.5
No
Revisiting spatio-temporal layouts for compositi...
2021-11-02
Code
37
EvaNet
38.1
Yes
Evolving Space-Time Neural Architectures for Vid...
2018-11-26
-
38
Timeception (I3D)
37.2
No
Timeception for Complex Action Recognition
2018-12-04
Code
39
I3D
32.9
No
Quo Vadis, Action Recognition? A New Model and t...
2017-05-22
Code
40
MoViNet-A2
32.5
No
MoViNets: Mobile Video Networks for Efficient Vi...
2021-03-21
Code
41
Timeception (R2D)
31.6
No
Timeception for Complex Action Recognition
2018-12-04
Code
42
MultiScale TRN
25.2
Yes
Temporal Relational Reasoning in Videos
2017-11-22
Code
43
Co Slow_64
25.2
No
Continual 3D Convolutional Neural Networks for R...
2021-05-31
Code
44
Slow-8×8
24.1
No
Continual 3D Convolutional Neural Networks for R...
2021-05-31
Code
45
Asyn-TF
22.4
Yes
Asynchronous Temporal Fields for Action Recognit...
2016-12-19
Code
46
CoViAR
21.9
Yes
Compressed Video Action Recognition
2017-12-02
Code
47
Co Slow_8
21.5
No
Continual 3D Convolutional Neural Networks for R...
2021-05-31
Code
48
2-Strm
18.6
No
Two-Stream Convolutional Networks for Action Rec...
2014-06-09
Code
49
JMRN (Pose only)
16.2
No
Pose And Joint-Aware Action Recognition
2020-10-16
Code
#1
TokenLearner
SOTA
66.3
MAP
· 2021-06-21
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Code
#2
TubeViT-L
66.2
MAP
· 2022-12-06
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
Code
#3
MoViNet-A6
SOTA
63.2
MAP
· 2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition
Code
#4
DEEP-HAL with ODF+SDF (AssembleNet++)
SOTA
62.29
MAP
· 2020-01-14
Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors
#5
AssembleNet++ 50
59.8
MAP
· 2020-08-18
AssembleNet++: Assembling Modality Representations via Attention Connections
Code
#6
AssembleNet
SOTA
58.6
MAP
· Extra Data
· 2019-05-30
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
Code
#7
AssembleNet-101
58.6
MAP
· 2019-05-30
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
Code
#8
VicTR (ViT-L/14)
57.6
MAP
· 2023-04-05
VicTR: Video-conditioned Text Representations for Activity Recognition
#9
AssembleNet++ 50 without object
54.98
MAP
· 2020-08-18
AssembleNet++: Assembling Modality Representations via Attention Connections
Code
#10
BIKE
50.7
MAP
· 2022-12-31
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Code
#11
DEEP-HAL with ODF+SDF (I3D)
50.16
MAP
· 2020-01-14
Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors
#12
MoViNet-A4
48.5
MAP
· 2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition
Code
#13
AdaFocus (weak supervision, MViT-B-24, 32x3)
47.8
MAP
· 2023-11-28
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
#14
MViT-B-24, 32x3 (Kinetics-600 pretraining)
47.7
MAP
· 2021-04-22
Multiscale Vision Transformers
Code
#15
En-VidTr-L
47.3
MAP
· 2021-04-23
VidTr: Video Transformer Without Convolutions
#16
MViT-B, 32x3 (Kinetics-600 pretraining)
47.1
MAP
· 2021-04-22
Multiscale Vision Transformers
Code
#17
MViT-B-24, 32x3 (Kinetics-400 pretraining)
46.3
MAP
· 2021-04-22
Multiscale Vision Transformers
Code
#18
SlowFast (Kinetics-600 pretraining, NL)
SOTA
45.2
MAP
· 2018-12-10
SlowFast Networks for Video Recognition
Code
#19
MViT-B, 32x3 (Kinetics-400 pretraining)
44.3
MAP
· 2021-04-22
Multiscale Vision Transformers
Code
#20
ActionCLIP (ViT-B/16)
44.3
MAP
· 2021-09-17
ActionCLIP: A New Paradigm for Video Action Recognition
Code
#21
MViT-B, 16x4 (Kinetics-600 pretraining)
43.9
MAP
· 2021-04-22
Multiscale Vision Transformers
Code
#22
VidTr-L
43.5
MAP
· 2021-04-23
VidTr: Video Transformer Without Convolutions
#23
JMRN + R101-NL-LFB
43.23
MAP
· 2020-10-16
Pose And Joint-Aware Action Recognition
Code
#24
HAF+BoW/FV/OFF halluc. +MSK×8/PN
43.1
MAP
· 2019-06-13
Hallucinating IDT Descriptors and I3D Optical Flow Features for Action Recognition with CNNs
#25
LFB
42.5
MAP
· Extra Data
· 2018-12-12
Long-Term Feature Banks for Detailed Video Understanding
Code
#26
SlowFast (Kinetics-400 pretraining, NL)
42.5
MAP
· 2018-12-10
SlowFast Networks for Video Recognition
Code
#27
SlowFast (Kinetics-600 pretraining)
42.1
MAP
· 2018-12-10
SlowFast Networks for Video Recognition
Code
#28
AdaFocus (weak supervision, MViT-B-K400-pretrain, 16x4)
41.4
MAP
· 2023-11-28
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
#29
AdaFocus (weak supervision, X3D-L, 32x3)
41.2
MAP
· 2023-11-28
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
#30
Timeception (R3D)
SOTA
41.1
MAP
· 2018-12-04
Timeception for Complex Action Recognition
Code
#31
PA3D + (GCN + I3D + NL I3D)
41
MAP
No paper
#32
PoTion + (GCN + I3D + NL I3D)
40.8
MAP
No paper
#33
MViT-B, 16x4 (Kinetics-400 pretraining)
40
MAP
· 2021-04-22
Multiscale Vision Transformers
Code
#34
STRG
SOTA
39.7
MAP
· Extra Data
· 2018-06-05
Videos as Space-Time Region Graphs
#35
AdaFocus (weak supervision, Slowfast-R50, 16x8)
39.3
MAP
· 2023-11-28
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
#36
STLT + I3D
38.5
MAP
· 2021-11-02
Revisiting spatio-temporal layouts for compositional action recognition
Code
#37
EvaNet
38.1
MAP
· Extra Data
· 2018-11-26
Evolving Space-Time Neural Architectures for Videos
#38
Timeception (I3D)
37.2
MAP
· 2018-12-04
Timeception for Complex Action Recognition
Code
#39
I3D
SOTA
32.9
MAP
· 2017-05-22
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Code
#40
MoViNet-A2
32.5
MAP
· 2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition
Code
#41
Timeception (R2D)
31.6
MAP
· 2018-12-04
Timeception for Complex Action Recognition
Code
#42
MultiScale TRN
25.2
MAP
· Extra Data
· 2017-11-22
Temporal Relational Reasoning in Videos
Code
#43
Co Slow_64
25.2
MAP
· 2021-05-31
Continual 3D Convolutional Neural Networks for Real-time Processing of Videos
Code
#44
Slow-8×8
24.1
MAP
· 2021-05-31
Continual 3D Convolutional Neural Networks for Real-time Processing of Videos
Code
#45
Asyn-TF
SOTA
22.4
MAP
· Extra Data
· 2016-12-19
Asynchronous Temporal Fields for Action Recognition
Code
#46
CoViAR
21.9
MAP
· Extra Data
· 2017-12-02
Compressed Video Action Recognition
Code
#47
Co Slow_8
21.5
MAP
· 2021-05-31
Continual 3D Convolutional Neural Networks for Real-time Processing of Videos
Code
#48
2-Strm
SOTA
18.6
MAP
· 2014-06-09
Two-Stream Convolutional Networks for Action Recognition in Videos
Code
#49
JMRN (Pose only)
16.2
MAP
· 2020-10-16
Pose And Joint-Aware Action Recognition
Code