Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Methodology
/
3D
/
ImageNet VID
3D on ImageNet VID
Metric: MAP (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide augmentations
Export CSV
#
Model
↕
MAP
▼
Augmentations
Paper
Date
↕
Code
1
YOLOV++
93.2
Yes
Practical Video Object Detection via Feature Sel...
2024-07-29
Code
2
DiffusionVID (Swin-B)
92.5
No
-
-
Code
3
Ours (Def. DETR + SwinB)
91.3
No
Objects do not disappear: Video object detection...
2023-08-09
Code
4
VSTAM
91.1
No
-
-
Code
5
TGBFormer (Swin B)
90.3
No
TGBFormer: Transformer-GraphFormer Blender Netwo...
2025-03-18
-
6
TransVOD (Swin Base)
90.1
Yes
TransVOD: End-to-End Video Object Detection with...
2022-01-13
Code
7
PTSEFormer (ResNet-101)
88.1
No
PTSEFormer: Progressive Temporal-Spatial Enhance...
2022-09-06
Code
8
Ours (Def. DETR + R101)
87.9
No
Objects do not disappear: Video object detection...
2023-08-09
Code
9
YOLOV
87.5
No
YOLOV: Making Still Image Object Detectors Great...
2022-08-20
Code
10
Ours (Faster RCNN + R101)
87.2
No
Objects do not disappear: Video object detection...
2023-08-09
Code
11
DiffusionVID (ResNet-101)
87.1
No
-
-
Code
12
DAFA-F (ResNeXt-101)
85.9
No
-
-
-
13
ClipVID
85.8
No
Identity-Consistent Aggregation for Video Object...
2023-08-15
Code
14
HVRNet (ResNeXt101-32x4d)
85.5
No
-
-
Code
15
MEGA (ResNeXt101)
85.4
No
Memory Enhanced Global-Local Aggregation for Vid...
2020-03-26
Code
16
BoxMask(ResNeXt101)
84.8
No
BoxMask: Revisiting Bounding Box Supervision for...
2022-10-12
-
17
DAFA-F (ResNet-101)
84.5
No
-
-
-
18
Temporal ROI Align (ResNeXt101)
84.3
No
Temporal RoI Align for Video Object Recognition
2021-09-08
Code
19
SELSA (ResNeXt-101)
84.3
No
Sequence Level Semantics Aggregation for Video O...
2019-07-15
Code
20
REPP + SELSA (ResNet-101)
84.2
No
Robust and efficient post-processing for video o...
2020-09-23
Code
21
HVRNet (ResNest101)
83.8
No
-
-
Code
22
Tracklet-Conditioned Detection+DCNv2+FGFA
83.5
No
Integrated Object Detection and Tracking with Tr...
2018-11-27
-
23
SELSA (ResNet-101)
82.69
No
Sequence Level Semantics Aggregation for Video O...
2019-07-15
Code
24
SLTnet FPN-X101
82.4
No
-
-
Code
25
LSTS (ResNet-101)
81.7
No
Learning Where to Focus for Efficient Video Obje...
2019-11-13
Code
26
BoxMask (ResNet-50)
80.7
No
BoxMask: Revisiting Bounding Box Supervision for...
2022-10-12
-
27
SparseVOD (ResNet-50)
80.3
No
Spatio-Temporal Learnable Proposals for End-to-E...
2022-10-05
-
28
REPP + FGFA
80.1
No
Robust and efficient post-processing for video o...
2020-09-23
Code
29
FGFA + Seq-NMS
80.1
No
Flow-Guided Feature Aggregation for Video Object...
2017-03-29
Code
30
Online TSM
76.3
No
TSM: Temporal Shift Module for Efficient Video U...
2018-11-20
Code
31
REPP + YOLOv3
75.1
No
Robust and efficient post-processing for video o...
2020-09-23
Code
32
YOLOv3
68.6
No
Robust and efficient post-processing for video o...
2020-09-23
Code
33
Looking Fast and Slow
63.9
Yes
Looking Fast and Slow: Memory-Guided Mobile Vide...
2019-03-25
Code