Action Detection on UCF101-24

Metric: Frame-mAP 0.5 (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	Frame-mAP 0.5▼	Extra Data	Paper	Date↕	Code
1	STAR/L	90.3	Yes	End-to-End Spatio-Temporal Action Localisation w...	2023-04-24	-
2	SiA	88.5	No	Scaling Open-Vocabulary Action Detection	2025-04-04	Code
3	YOWO + LFB	87.3	No	You Only Watch Once: A Unified CNN Architecture ...	2019-11-15	Code
4	HIT	84.8	No	Holistic Interaction Transformer Network for Act...	2022-10-23	Code
5	YOWO	80.4	No	You Only Watch Once: A Unified CNN Architecture ...	2019-11-15	Code
6	MOC	77.8	No	Actions as Moving Points	2020-01-14	Code
7	Faster-RCNN + two-stream I3D conv	76.3	No	AVA: A Video Dataset of Spatio-temporally Locali...	2017-05-23	Code
8	STEP	75	No	STEP: Spatio-Temporal Progressive Learning for V...	2019-04-19	Code
9	Stable Mean Teacher (I3D)	73.9	No	Stable Mean Teacher for Semi-supervised Video Ac...	2024-12-10	Code
10	HISAN (VGG-16)	73.71	No	-	-	-
11	TACNet	72.1	No	TACNet: Transition-Aware Context Network for Spa...	2019-05-31	-
12	E2E-SSL (I3D)	69.9	No	End-to-End Semi-Supervised Learning for Video Ac...	2022-03-08	Code
13	T-CNN	41.37	No	Tube Convolutional Neural Network (T-CNN) for Ac...	2017-03-30	Code
14	TS R-CNN	39.94	No	-	-	-
15	MR-TS R-CNN	39.63	No	-	-	-

#1STAR/LSOTA
90.3
Frame-mAP 0.5· Extra Data· 2023-04-24
End-to-End Spatio-Temporal Action Localisation with Video Transformers
#2SiA
88.5
Frame-mAP 0.5· 2025-04-04
Scaling Open-Vocabulary Action Detection Code
#3YOWO + LFBSOTA
87.3
Frame-mAP 0.5· 2019-11-15
You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization Code
#4HIT
84.8
Frame-mAP 0.5· 2022-10-23
Holistic Interaction Transformer Network for Action Detection Code
#5YOWO
80.4
Frame-mAP 0.5· 2019-11-15
You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization Code
#6MOC
77.8
Frame-mAP 0.5· 2020-01-14
Actions as Moving Points Code
#7Faster-RCNN + two-stream I3D convSOTA
76.3
Frame-mAP 0.5· 2017-05-23
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions Code
#8STEP
75
Frame-mAP 0.5· 2019-04-19
STEP: Spatio-Temporal Progressive Learning for Video Action Detection Code
#9Stable Mean Teacher (I3D)
73.9
Frame-mAP 0.5· 2024-12-10
Stable Mean Teacher for Semi-supervised Video Action Detection Code
#10HISAN (VGG-16)
73.71
Frame-mAP 0.5
No paper
#11TACNet
72.1
Frame-mAP 0.5· 2019-05-31
TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection
#12E2E-SSL (I3D)
69.9
Frame-mAP 0.5· 2022-03-08
End-to-End Semi-Supervised Learning for Video Action Detection Code
#13T-CNNSOTA
41.37
Frame-mAP 0.5· 2017-03-30
Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos Code
#14TS R-CNN
39.94
Frame-mAP 0.5
No paper
#15MR-TS R-CNN
39.63
Frame-mAP 0.5
No paper