Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Time Series
/
Action Detection
/
UCF101-24
Action Detection on UCF101-24
Metric: Frame-mAP 0.5 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Frame-mAP 0.5 (best first)
Frame-mAP 0.5 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Frame-mAP 0.5
▼
Extra Data
Paper
Date
↕
Code
1
STAR/L
90.3
Yes
End-to-End Spatio-Temporal Action Localisation w...
2023-04-24
-
2
SiA
88.5
No
Scaling Open-Vocabulary Action Detection
2025-04-04
Code
3
YOWO + LFB
87.3
No
You Only Watch Once: A Unified CNN Architecture ...
2019-11-15
Code
4
HIT
84.8
No
Holistic Interaction Transformer Network for Act...
2022-10-23
Code
5
YOWO
80.4
No
You Only Watch Once: A Unified CNN Architecture ...
2019-11-15
Code
6
MOC
77.8
No
Actions as Moving Points
2020-01-14
Code
7
Faster-RCNN + two-stream I3D conv
76.3
No
AVA: A Video Dataset of Spatio-temporally Locali...
2017-05-23
Code
8
STEP
75
No
STEP: Spatio-Temporal Progressive Learning for V...
2019-04-19
Code
9
Stable Mean Teacher (I3D)
73.9
No
Stable Mean Teacher for Semi-supervised Video Ac...
2024-12-10
Code
10
HISAN (VGG-16)
73.71
No
-
-
-
11
TACNet
72.1
No
TACNet: Transition-Aware Context Network for Spa...
2019-05-31
-
12
E2E-SSL (I3D)
69.9
No
End-to-End Semi-Supervised Learning for Video Ac...
2022-03-08
Code
13
T-CNN
41.37
No
Tube Convolutional Neural Network (T-CNN) for Ac...
2017-03-30
Code
14
TS R-CNN
39.94
No
-
-
-
15
MR-TS R-CNN
39.63
No
-
-
-
#1
STAR/L
SOTA
90.3
Frame-mAP 0.5
· Extra Data
· 2023-04-24
End-to-End Spatio-Temporal Action Localisation with Video Transformers
#2
SiA
88.5
Frame-mAP 0.5
· 2025-04-04
Scaling Open-Vocabulary Action Detection
Code
#3
YOWO + LFB
SOTA
87.3
Frame-mAP 0.5
· 2019-11-15
You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization
Code
#4
HIT
84.8
Frame-mAP 0.5
· 2022-10-23
Holistic Interaction Transformer Network for Action Detection
Code
#5
YOWO
80.4
Frame-mAP 0.5
· 2019-11-15
You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization
Code
#6
MOC
77.8
Frame-mAP 0.5
· 2020-01-14
Actions as Moving Points
Code
#7
Faster-RCNN + two-stream I3D conv
SOTA
76.3
Frame-mAP 0.5
· 2017-05-23
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
Code
#8
STEP
75
Frame-mAP 0.5
· 2019-04-19
STEP: Spatio-Temporal Progressive Learning for Video Action Detection
Code
#9
Stable Mean Teacher (I3D)
73.9
Frame-mAP 0.5
· 2024-12-10
Stable Mean Teacher for Semi-supervised Video Action Detection
Code
#10
HISAN (VGG-16)
73.71
Frame-mAP 0.5
No paper
#11
TACNet
72.1
Frame-mAP 0.5
· 2019-05-31
TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection
#12
E2E-SSL (I3D)
69.9
Frame-mAP 0.5
· 2022-03-08
End-to-End Semi-Supervised Learning for Video Action Detection
Code
#13
T-CNN
SOTA
41.37
Frame-mAP 0.5
· 2017-03-30
Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos
Code
#14
TS R-CNN
39.94
Frame-mAP 0.5
No paper
#15
MR-TS R-CNN
39.63
Frame-mAP 0.5
No paper