Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Methodology
/
3D
/
ImageNet VID
3D on ImageNet VID
Metric: MAP (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide augmentations
Export CSV
Sort:
MAP (best first)
MAP (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
MAP
▼
Augmentations
Paper
Date
↕
Code
1
YOLOV++
93.2
Yes
Practical Video Object Detection via Feature Sel...
2024-07-29
Code
2
DiffusionVID (Swin-B)
92.5
No
-
-
Code
3
Ours (Def. DETR + SwinB)
91.3
No
Objects do not disappear: Video object detection...
2023-08-09
Code
4
VSTAM
91.1
No
-
-
Code
5
TGBFormer (Swin B)
90.3
No
TGBFormer: Transformer-GraphFormer Blender Netwo...
2025-03-18
-
6
TransVOD (Swin Base)
90.1
Yes
TransVOD: End-to-End Video Object Detection with...
2022-01-13
Code
7
PTSEFormer (ResNet-101)
88.1
No
PTSEFormer: Progressive Temporal-Spatial Enhance...
2022-09-06
Code
8
Ours (Def. DETR + R101)
87.9
No
Objects do not disappear: Video object detection...
2023-08-09
Code
9
YOLOV
87.5
No
YOLOV: Making Still Image Object Detectors Great...
2022-08-20
Code
10
Ours (Faster RCNN + R101)
87.2
No
Objects do not disappear: Video object detection...
2023-08-09
Code
11
DiffusionVID (ResNet-101)
87.1
No
-
-
Code
12
DAFA-F (ResNeXt-101)
85.9
No
-
-
-
13
ClipVID
85.8
No
Identity-Consistent Aggregation for Video Object...
2023-08-15
Code
14
HVRNet (ResNeXt101-32x4d)
85.5
No
-
-
Code
15
MEGA (ResNeXt101)
85.4
No
Memory Enhanced Global-Local Aggregation for Vid...
2020-03-26
Code
16
BoxMask(ResNeXt101)
84.8
No
BoxMask: Revisiting Bounding Box Supervision for...
2022-10-12
-
17
DAFA-F (ResNet-101)
84.5
No
-
-
-
18
Temporal ROI Align (ResNeXt101)
84.3
No
Temporal RoI Align for Video Object Recognition
2021-09-08
Code
19
SELSA (ResNeXt-101)
84.3
No
Sequence Level Semantics Aggregation for Video O...
2019-07-15
Code
20
REPP + SELSA (ResNet-101)
84.2
No
Robust and efficient post-processing for video o...
2020-09-23
Code
21
HVRNet (ResNest101)
83.8
No
-
-
Code
22
Tracklet-Conditioned Detection+DCNv2+FGFA
83.5
No
Integrated Object Detection and Tracking with Tr...
2018-11-27
-
23
SELSA (ResNet-101)
82.69
No
Sequence Level Semantics Aggregation for Video O...
2019-07-15
Code
24
SLTnet FPN-X101
82.4
No
-
-
Code
25
LSTS (ResNet-101)
81.7
No
Learning Where to Focus for Efficient Video Obje...
2019-11-13
Code
26
BoxMask (ResNet-50)
80.7
No
BoxMask: Revisiting Bounding Box Supervision for...
2022-10-12
-
27
SparseVOD (ResNet-50)
80.3
No
Spatio-Temporal Learnable Proposals for End-to-E...
2022-10-05
-
28
REPP + FGFA
80.1
No
Robust and efficient post-processing for video o...
2020-09-23
Code
29
FGFA + Seq-NMS
80.1
No
Flow-Guided Feature Aggregation for Video Object...
2017-03-29
Code
30
Online TSM
76.3
No
TSM: Temporal Shift Module for Efficient Video U...
2018-11-20
Code
31
REPP + YOLOv3
75.1
No
Robust and efficient post-processing for video o...
2020-09-23
Code
32
YOLOv3
68.6
No
Robust and efficient post-processing for video o...
2020-09-23
Code
33
Looking Fast and Slow
63.9
Yes
Looking Fast and Slow: Memory-Guided Mobile Vide...
2019-03-25
Code
#1
YOLOV++
SOTA
93.2
MAP
· Augmentations
· 2024-07-29
Practical Video Object Detection via Feature Selection and Aggregation
Code
#2
DiffusionVID (Swin-B)
92.5
MAP
No paper
Code
#3
Ours (Def. DETR + SwinB)
SOTA
91.3
MAP
· 2023-08-09
Objects do not disappear: Video object detection by single-frame object location anticipation
Code
#4
VSTAM
91.1
MAP
No paper
Code
#5
TGBFormer (Swin B)
90.3
MAP
· 2025-03-18
TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection
#6
TransVOD (Swin Base)
SOTA
90.1
MAP
· Augmentations
· 2022-01-13
TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers
Code
#7
PTSEFormer (ResNet-101)
88.1
MAP
· 2022-09-06
PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection
Code
#8
Ours (Def. DETR + R101)
87.9
MAP
· 2023-08-09
Objects do not disappear: Video object detection by single-frame object location anticipation
Code
#9
YOLOV
87.5
MAP
· 2022-08-20
YOLOV: Making Still Image Object Detectors Great at Video Object Detection
Code
#10
Ours (Faster RCNN + R101)
87.2
MAP
· 2023-08-09
Objects do not disappear: Video object detection by single-frame object location anticipation
Code
#11
DiffusionVID (ResNet-101)
87.1
MAP
No paper
Code
#12
DAFA-F (ResNeXt-101)
85.9
MAP
No paper
#13
ClipVID
85.8
MAP
· 2023-08-15
Identity-Consistent Aggregation for Video Object Detection
Code
#14
HVRNet (ResNeXt101-32x4d)
85.5
MAP
No paper
Code
#15
MEGA (ResNeXt101)
SOTA
85.4
MAP
· 2020-03-26
Memory Enhanced Global-Local Aggregation for Video Object Detection
Code
#16
BoxMask(ResNeXt101)
84.8
MAP
· 2022-10-12
BoxMask: Revisiting Bounding Box Supervision for Video Object Detection
#17
DAFA-F (ResNet-101)
84.5
MAP
No paper
#18
Temporal ROI Align (ResNeXt101)
84.3
MAP
· 2021-09-08
Temporal RoI Align for Video Object Recognition
Code
#19
SELSA (ResNeXt-101)
SOTA
84.3
MAP
· 2019-07-15
Sequence Level Semantics Aggregation for Video Object Detection
Code
#20
REPP + SELSA (ResNet-101)
84.2
MAP
· 2020-09-23
Robust and efficient post-processing for video object detection
Code
#21
HVRNet (ResNest101)
83.8
MAP
No paper
Code
#22
Tracklet-Conditioned Detection+DCNv2+FGFA
SOTA
83.5
MAP
· 2018-11-27
Integrated Object Detection and Tracking with Tracklet-Conditioned Detection
#23
SELSA (ResNet-101)
82.69
MAP
· 2019-07-15
Sequence Level Semantics Aggregation for Video Object Detection
Code
#24
SLTnet FPN-X101
82.4
MAP
No paper
Code
#25
LSTS (ResNet-101)
81.7
MAP
· 2019-11-13
Learning Where to Focus for Efficient Video Object Detection
Code
#26
BoxMask (ResNet-50)
80.7
MAP
· 2022-10-12
BoxMask: Revisiting Bounding Box Supervision for Video Object Detection
#27
SparseVOD (ResNet-50)
80.3
MAP
· 2022-10-05
Spatio-Temporal Learnable Proposals for End-to-End Video Object Detection
#28
REPP + FGFA
80.1
MAP
· 2020-09-23
Robust and efficient post-processing for video object detection
Code
#29
FGFA + Seq-NMS
SOTA
80.1
MAP
· 2017-03-29
Flow-Guided Feature Aggregation for Video Object Detection
Code
#30
Online TSM
76.3
MAP
· 2018-11-20
TSM: Temporal Shift Module for Efficient Video Understanding
Code
#31
REPP + YOLOv3
75.1
MAP
· 2020-09-23
Robust and efficient post-processing for video object detection
Code
#32
YOLOv3
68.6
MAP
· 2020-09-23
Robust and efficient post-processing for video object detection
Code
#33
Looking Fast and Slow
63.9
MAP
· Augmentations
· 2019-03-25
Looking Fast and Slow: Memory-Guided Mobile Video Object Detection
Code