TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Methodology/16k/ImageNet VID

16k on ImageNet VID

Metric: MAP (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕MAP ▼AugmentationsPaperDate↕Code
1YOLOV++93.2YesPractical Video Object Detection via Feature Sel...2024-07-29Code
2DiffusionVID (Swin-B)92.5No--Code
3Ours (Def. DETR + SwinB)91.3NoObjects do not disappear: Video object detection...2023-08-09Code
4VSTAM91.1No--Code
5TGBFormer (Swin B)90.3NoTGBFormer: Transformer-GraphFormer Blender Netwo...2025-03-18-
6TransVOD (Swin Base)90.1YesTransVOD: End-to-End Video Object Detection with...2022-01-13Code
7PTSEFormer (ResNet-101)88.1NoPTSEFormer: Progressive Temporal-Spatial Enhance...2022-09-06Code
8Ours (Def. DETR + R101)87.9NoObjects do not disappear: Video object detection...2023-08-09Code
9YOLOV87.5NoYOLOV: Making Still Image Object Detectors Great...2022-08-20Code
10Ours (Faster RCNN + R101)87.2NoObjects do not disappear: Video object detection...2023-08-09Code
11DiffusionVID (ResNet-101)87.1No--Code
12DAFA-F (ResNeXt-101)85.9No---
13ClipVID85.8NoIdentity-Consistent Aggregation for Video Object...2023-08-15Code
14HVRNet (ResNeXt101-32x4d)85.5No--Code
15MEGA (ResNeXt101)85.4NoMemory Enhanced Global-Local Aggregation for Vid...2020-03-26Code
16BoxMask(ResNeXt101)84.8NoBoxMask: Revisiting Bounding Box Supervision for...2022-10-12-
17DAFA-F (ResNet-101)84.5No---
18Temporal ROI Align (ResNeXt101)84.3NoTemporal RoI Align for Video Object Recognition2021-09-08Code
19SELSA (ResNeXt-101)84.3NoSequence Level Semantics Aggregation for Video O...2019-07-15Code
20REPP + SELSA (ResNet-101)84.2NoRobust and efficient post-processing for video o...2020-09-23Code
21HVRNet (ResNest101)83.8No--Code
22Tracklet-Conditioned Detection+DCNv2+FGFA83.5NoIntegrated Object Detection and Tracking with Tr...2018-11-27-
23SELSA (ResNet-101)82.69NoSequence Level Semantics Aggregation for Video O...2019-07-15Code
24SLTnet FPN-X10182.4No--Code
25LSTS (ResNet-101)81.7NoLearning Where to Focus for Efficient Video Obje...2019-11-13Code
26BoxMask (ResNet-50)80.7NoBoxMask: Revisiting Bounding Box Supervision for...2022-10-12-
27SparseVOD (ResNet-50)80.3NoSpatio-Temporal Learnable Proposals for End-to-E...2022-10-05-
28REPP + FGFA80.1NoRobust and efficient post-processing for video o...2020-09-23Code
29FGFA + Seq-NMS80.1NoFlow-Guided Feature Aggregation for Video Object...2017-03-29Code
30Online TSM76.3NoTSM: Temporal Shift Module for Efficient Video U...2018-11-20Code
31REPP + YOLOv375.1NoRobust and efficient post-processing for video o...2020-09-23Code
32YOLOv368.6NoRobust and efficient post-processing for video o...2020-09-23Code
33Looking Fast and Slow63.9YesLooking Fast and Slow: Memory-Guided Mobile Vide...2019-03-25Code