3D on ImageNet VID

Metric: MAP (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide augmentations

Sort:

#	Model↕	MAP ▼	Augmentations	Paper	Date↕	Code
1	YOLOV++	93.2	Yes	Practical Video Object Detection via Feature Sel...	2024-07-29	Code
2	DiffusionVID (Swin-B)	92.5	No	-	-	Code
3	Ours (Def. DETR + SwinB)	91.3	No	Objects do not disappear: Video object detection...	2023-08-09	Code
4	VSTAM	91.1	No	-	-	Code
5	TGBFormer (Swin B)	90.3	No	TGBFormer: Transformer-GraphFormer Blender Netwo...	2025-03-18	-
6	TransVOD (Swin Base)	90.1	Yes	TransVOD: End-to-End Video Object Detection with...	2022-01-13	Code
7	PTSEFormer (ResNet-101)	88.1	No	PTSEFormer: Progressive Temporal-Spatial Enhance...	2022-09-06	Code
8	Ours (Def. DETR + R101)	87.9	No	Objects do not disappear: Video object detection...	2023-08-09	Code
9	YOLOV	87.5	No	YOLOV: Making Still Image Object Detectors Great...	2022-08-20	Code
10	Ours (Faster RCNN + R101)	87.2	No	Objects do not disappear: Video object detection...	2023-08-09	Code
11	DiffusionVID (ResNet-101)	87.1	No	-	-	Code
12	DAFA-F (ResNeXt-101)	85.9	No	-	-	-
13	ClipVID	85.8	No	Identity-Consistent Aggregation for Video Object...	2023-08-15	Code
14	HVRNet (ResNeXt101-32x4d)	85.5	No	-	-	Code
15	MEGA (ResNeXt101)	85.4	No	Memory Enhanced Global-Local Aggregation for Vid...	2020-03-26	Code
16	BoxMask(ResNeXt101)	84.8	No	BoxMask: Revisiting Bounding Box Supervision for...	2022-10-12	-
17	DAFA-F (ResNet-101)	84.5	No	-	-	-
18	Temporal ROI Align (ResNeXt101)	84.3	No	Temporal RoI Align for Video Object Recognition	2021-09-08	Code
19	SELSA (ResNeXt-101)	84.3	No	Sequence Level Semantics Aggregation for Video O...	2019-07-15	Code
20	REPP + SELSA (ResNet-101)	84.2	No	Robust and efficient post-processing for video o...	2020-09-23	Code
21	HVRNet (ResNest101)	83.8	No	-	-	Code
22	Tracklet-Conditioned Detection+DCNv2+FGFA	83.5	No	Integrated Object Detection and Tracking with Tr...	2018-11-27	-
23	SELSA (ResNet-101)	82.69	No	Sequence Level Semantics Aggregation for Video O...	2019-07-15	Code
24	SLTnet FPN-X101	82.4	No	-	-	Code
25	LSTS (ResNet-101)	81.7	No	Learning Where to Focus for Efficient Video Obje...	2019-11-13	Code
26	BoxMask (ResNet-50)	80.7	No	BoxMask: Revisiting Bounding Box Supervision for...	2022-10-12	-
27	SparseVOD (ResNet-50)	80.3	No	Spatio-Temporal Learnable Proposals for End-to-E...	2022-10-05	-
28	REPP + FGFA	80.1	No	Robust and efficient post-processing for video o...	2020-09-23	Code
29	FGFA + Seq-NMS	80.1	No	Flow-Guided Feature Aggregation for Video Object...	2017-03-29	Code
30	Online TSM	76.3	No	TSM: Temporal Shift Module for Efficient Video U...	2018-11-20	Code
31	REPP + YOLOv3	75.1	No	Robust and efficient post-processing for video o...	2020-09-23	Code
32	YOLOv3	68.6	No	Robust and efficient post-processing for video o...	2020-09-23	Code
33	Looking Fast and Slow	63.9	Yes	Looking Fast and Slow: Memory-Guided Mobile Vide...	2019-03-25	Code

#1YOLOV++SOTA
93.2
MAP · Augmentations· 2024-07-29
Practical Video Object Detection via Feature Selection and Aggregation Code
#2DiffusionVID (Swin-B)
92.5
MAP
No paperCode
#3Ours (Def. DETR + SwinB)SOTA
91.3
MAP · 2023-08-09
Objects do not disappear: Video object detection by single-frame object location anticipation Code
#4VSTAM
91.1
MAP
No paperCode
#5TGBFormer (Swin B)
90.3
MAP · 2025-03-18
TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection
#6TransVOD (Swin Base)SOTA
90.1
MAP · Augmentations· 2022-01-13
TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers Code
#7PTSEFormer (ResNet-101)
88.1
MAP · 2022-09-06
PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection Code
#8Ours (Def. DETR + R101)
87.9
MAP · 2023-08-09
Objects do not disappear: Video object detection by single-frame object location anticipation Code
#9YOLOV
87.5
MAP · 2022-08-20
YOLOV: Making Still Image Object Detectors Great at Video Object Detection Code
#10Ours (Faster RCNN + R101)
87.2
MAP · 2023-08-09
Objects do not disappear: Video object detection by single-frame object location anticipation Code
#11DiffusionVID (ResNet-101)
87.1
MAP
No paperCode
#12DAFA-F (ResNeXt-101)
85.9
MAP
No paper
#13ClipVID
85.8
MAP · 2023-08-15
Identity-Consistent Aggregation for Video Object Detection Code
#14HVRNet (ResNeXt101-32x4d)
85.5
MAP
No paperCode
#15MEGA (ResNeXt101)SOTA
85.4
MAP · 2020-03-26
Memory Enhanced Global-Local Aggregation for Video Object Detection Code
#16BoxMask(ResNeXt101)
84.8
MAP · 2022-10-12
BoxMask: Revisiting Bounding Box Supervision for Video Object Detection
#17DAFA-F (ResNet-101)
84.5
MAP
No paper
#18Temporal ROI Align (ResNeXt101)
84.3
MAP · 2021-09-08
Temporal RoI Align for Video Object Recognition Code
#19SELSA (ResNeXt-101)SOTA
84.3
MAP · 2019-07-15
Sequence Level Semantics Aggregation for Video Object Detection Code
#20REPP + SELSA (ResNet-101)
84.2
MAP · 2020-09-23
Robust and efficient post-processing for video object detection Code
#21HVRNet (ResNest101)
83.8
MAP
No paperCode
#22Tracklet-Conditioned Detection+DCNv2+FGFASOTA
83.5
MAP · 2018-11-27
Integrated Object Detection and Tracking with Tracklet-Conditioned Detection
#23SELSA (ResNet-101)
82.69
MAP · 2019-07-15
Sequence Level Semantics Aggregation for Video Object Detection Code
#24SLTnet FPN-X101
82.4
MAP
No paperCode
#25LSTS (ResNet-101)
81.7
MAP · 2019-11-13
Learning Where to Focus for Efficient Video Object Detection Code
#26BoxMask (ResNet-50)
80.7
MAP · 2022-10-12
BoxMask: Revisiting Bounding Box Supervision for Video Object Detection
#27SparseVOD (ResNet-50)
80.3
MAP · 2022-10-05
Spatio-Temporal Learnable Proposals for End-to-End Video Object Detection
#28REPP + FGFA
80.1
MAP · 2020-09-23
Robust and efficient post-processing for video object detection Code
#29FGFA + Seq-NMSSOTA
80.1
MAP · 2017-03-29
Flow-Guided Feature Aggregation for Video Object Detection Code
#30Online TSM
76.3
MAP · 2018-11-20
TSM: Temporal Shift Module for Efficient Video Understanding Code
#31REPP + YOLOv3
75.1
MAP · 2020-09-23
Robust and efficient post-processing for video object detection Code
#32YOLOv3
68.6
MAP · 2020-09-23
Robust and efficient post-processing for video object detection Code
#33Looking Fast and Slow
63.9
MAP · Augmentations· 2019-03-25
Looking Fast and Slow: Memory-Guided Mobile Video Object Detection Code