Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Time Series
/
Action Recognition
/
ActivityNet
Action Recognition on ActivityNet
Metric: mAP (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
mAP (best first)
mAP (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
mAP
▼
Extra Data
Paper
Date
↕
Code
1
Text4Vis (w/ ViT-L)
96.9
No
Revisiting Classifier: Transferring Vision-Langu...
2022-07-04
Code
2
BIKE
96.1
No
Bidirectional Cross-Modal Knowledge Exploration ...
2022-12-31
Code
3
InternVideo2-6B
95.9
Yes
InternVideo2: Scaling Foundation Models for Mult...
2024-03-22
Code
4
NSNet (w/ Swin-L)
94.3
No
NSNet: Non-saliency Suppression Sampler for Effi...
2022-07-21
-
5
TSQNet (w/ Swin-L)
93.7
No
Temporal Saliency Query Network for Efficient Vi...
2022-07-21
-
6
DSANet (w/ 3D ResNet50)
90.5
No
DSANet: Dynamic Segment Aggregation Network for ...
2021-05-25
Code
7
MARL (w/ SEResNeXt-152)
90.05
No
Multi-Agent Reinforcement Learning Based Frame S...
2019-07-31
-
8
ListenToLook
89.9
No
Listen to Look: Action Recognition by Previewing...
2019-12-10
Code
9
DSN
87.9
No
Dynamic Sampling Networks for Efficient Action R...
2020-06-28
-
10
SMART
84.4
No
SMART Frame Selection for Action Recognition
2020-12-19
-
11
Ada3D
84
No
2D or not 2D? Adaptive 3D Convolution Selection ...
2020-12-29
-
12
RRA
83.4
No
Fine-grained Video Categorization with Redundanc...
2018-10-26
-
13
P3D
78.9
No
Learning Spatio-Temporal Representation with Pse...
2017-11-28
Code
14
LSTM + Pretrained on YT-8M
75.6
No
YouTube-8M: A Large-Scale Video Classification B...
2016-09-27
Code
15
VGG19 + 393K webcam images
53.8
Yes
Do Less and Achieve More: Training CNNs for Acti...
2015-12-22
-
16
CD-UAR
53.8
No
Towards Universal Representation for Unseen Acti...
2018-03-22
-
17
VGG19
52.3
No
Do Less and Achieve More: Training CNNs for Acti...
2015-12-22
-
#1
Text4Vis (w/ ViT-L)
SOTA
96.9
mAP
· 2022-07-04
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Code
#2
BIKE
96.1
mAP
· 2022-12-31
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Code
#3
InternVideo2-6B
95.9
mAP
· Extra Data
· 2024-03-22
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
Code
#4
NSNet (w/ Swin-L)
94.3
mAP
· 2022-07-21
NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition
#5
TSQNet (w/ Swin-L)
93.7
mAP
· 2022-07-21
Temporal Saliency Query Network for Efficient Video Recognition
#6
DSANet (w/ 3D ResNet50)
SOTA
90.5
mAP
· 2021-05-25
DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning
Code
#7
MARL (w/ SEResNeXt-152)
SOTA
90.05
mAP
· 2019-07-31
Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition
#8
ListenToLook
89.9
mAP
· 2019-12-10
Listen to Look: Action Recognition by Previewing Audio
Code
#9
DSN
87.9
mAP
· 2020-06-28
Dynamic Sampling Networks for Efficient Action Recognition in Videos
#10
SMART
84.4
mAP
· 2020-12-19
SMART Frame Selection for Action Recognition
#11
Ada3D
84
mAP
· 2020-12-29
2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition
#12
RRA
SOTA
83.4
mAP
· 2018-10-26
Fine-grained Video Categorization with Redundancy Reduction Attention
#13
P3D
SOTA
78.9
mAP
· 2017-11-28
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks
Code
#14
LSTM + Pretrained on YT-8M
SOTA
75.6
mAP
· 2016-09-27
YouTube-8M: A Large-Scale Video Classification Benchmark
Code
#15
VGG19 + 393K webcam images
SOTA
53.8
mAP
· Extra Data
· 2015-12-22
Do Less and Achieve More: Training CNNs for Action Recognition Utilizing Action Images from the Web
#16
CD-UAR
53.8
mAP
· 2018-03-22
Towards Universal Representation for Unseen Action Recognition
#17
VGG19
52.3
mAP
· 2015-12-22
Do Less and Achieve More: Training CNNs for Action Recognition Utilizing Action Images from the Web