TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DINO: DETR with Improved DeNoising Anchor Boxes for End-to...

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum

2022-03-07Real-Time Object DetectionObject Detection
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCode

Abstract

We present DINO (\textbf{D}ETR with \textbf{I}mproved de\textbf{N}oising anch\textbf{O}r boxes), a state-of-the-art end-to-end object detector. % in this paper. DINO improves over previous DETR-like models in performance and efficiency by using a contrastive way for denoising training, a mixed query selection method for anchor initialization, and a look forward twice scheme for box prediction. DINO achieves $49.4$AP in $12$ epochs and $51.3$AP in $24$ epochs on COCO with a ResNet-50 backbone and multi-scale features, yielding a significant improvement of $\textbf{+6.0}$\textbf{AP} and $\textbf{+2.7}$\textbf{AP}, respectively, compared to DN-DETR, the previous best DETR-like model. DINO scales well in both model size and data size. Without bells and whistles, after pre-training on the Objects365 dataset with a SwinL backbone, DINO obtains the best results on both COCO \texttt{val2017} ($\textbf{63.2}$\textbf{AP}) and \texttt{test-dev} (\textbf{$\textbf{63.3}$AP}). Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results. Our code will be available at \url{https://github.com/IDEACVR/DINO}.

Results

TaskDatasetMetricValueModel
Object DetectionCOCO test-devbox mAP63.3DINO (Swin-L,multi-scale, TTA)
Object DetectionCOCO-OAverage mAP42.1DINO (Swin-L)
Object DetectionCOCO-OEffective Robustness15.76DINO (Swin-L)
Object DetectionSA-Det-100kAP43.7DINO (ResNet50 1x VFL)
Object DetectionSA-Det-100kAP5052DINO (ResNet50 1x VFL)
Object DetectionSA-Det-100kAP7547.7DINO (ResNet50 1x VFL)
Object DetectionSA-Det-100kAPL61.5DINO (ResNet50 1x VFL)
Object DetectionSA-Det-100kAPM43DINO (ResNet50 1x VFL)
Object DetectionSA-Det-100kAPS5.8DINO (ResNet50 1x VFL)
Object DetectionCOCO minivalbox AP63.2DINO (Swin-L)
Object DetectionCOCO minivalAP5069.1DINO-5scale (24 epoch)
Object DetectionCOCO minivalAP7556DINO-5scale (24 epoch)
Object DetectionCOCO minivalAPL65.8DINO-5scale (24 epoch)
Object DetectionCOCO minivalAPM54.2DINO-5scale (24 epoch)
Object DetectionCOCO minivalAPS34.5DINO-5scale (24 epoch)
Object DetectionCOCO minivalbox AP51.3DINO-5scale (24 epoch)
Object DetectionCOCO minivalAP5069DINO-5scale (36 epoch)
Object DetectionCOCO minivalAP7555.8DINO-5scale (36 epoch)
Object DetectionCOCO minivalAPL65.3DINO-5scale (36 epoch)
Object DetectionCOCO minivalAPM54.3DINO-5scale (36 epoch)
Object DetectionCOCO minivalAPS35DINO-5scale (36 epoch)
Object DetectionCOCO minivalbox AP51.2DINO-5scale (36 epoch)
3DCOCO test-devbox mAP63.3DINO (Swin-L,multi-scale, TTA)
3DCOCO-OAverage mAP42.1DINO (Swin-L)
3DCOCO-OEffective Robustness15.76DINO (Swin-L)
3DSA-Det-100kAP43.7DINO (ResNet50 1x VFL)
3DSA-Det-100kAP5052DINO (ResNet50 1x VFL)
3DSA-Det-100kAP7547.7DINO (ResNet50 1x VFL)
3DSA-Det-100kAPL61.5DINO (ResNet50 1x VFL)
3DSA-Det-100kAPM43DINO (ResNet50 1x VFL)
3DSA-Det-100kAPS5.8DINO (ResNet50 1x VFL)
3DCOCO minivalbox AP63.2DINO (Swin-L)
3DCOCO minivalAP5069.1DINO-5scale (24 epoch)
3DCOCO minivalAP7556DINO-5scale (24 epoch)
3DCOCO minivalAPL65.8DINO-5scale (24 epoch)
3DCOCO minivalAPM54.2DINO-5scale (24 epoch)
3DCOCO minivalAPS34.5DINO-5scale (24 epoch)
3DCOCO minivalbox AP51.3DINO-5scale (24 epoch)
3DCOCO minivalAP5069DINO-5scale (36 epoch)
3DCOCO minivalAP7555.8DINO-5scale (36 epoch)
3DCOCO minivalAPL65.3DINO-5scale (36 epoch)
3DCOCO minivalAPM54.3DINO-5scale (36 epoch)
3DCOCO minivalAPS35DINO-5scale (36 epoch)
3DCOCO minivalbox AP51.2DINO-5scale (36 epoch)
2D ClassificationCOCO test-devbox mAP63.3DINO (Swin-L,multi-scale, TTA)
2D ClassificationCOCO-OAverage mAP42.1DINO (Swin-L)
2D ClassificationCOCO-OEffective Robustness15.76DINO (Swin-L)
2D ClassificationSA-Det-100kAP43.7DINO (ResNet50 1x VFL)
2D ClassificationSA-Det-100kAP5052DINO (ResNet50 1x VFL)
2D ClassificationSA-Det-100kAP7547.7DINO (ResNet50 1x VFL)
2D ClassificationSA-Det-100kAPL61.5DINO (ResNet50 1x VFL)
2D ClassificationSA-Det-100kAPM43DINO (ResNet50 1x VFL)
2D ClassificationSA-Det-100kAPS5.8DINO (ResNet50 1x VFL)
2D ClassificationCOCO minivalbox AP63.2DINO (Swin-L)
2D ClassificationCOCO minivalAP5069.1DINO-5scale (24 epoch)
2D ClassificationCOCO minivalAP7556DINO-5scale (24 epoch)
2D ClassificationCOCO minivalAPL65.8DINO-5scale (24 epoch)
2D ClassificationCOCO minivalAPM54.2DINO-5scale (24 epoch)
2D ClassificationCOCO minivalAPS34.5DINO-5scale (24 epoch)
2D ClassificationCOCO minivalbox AP51.3DINO-5scale (24 epoch)
2D ClassificationCOCO minivalAP5069DINO-5scale (36 epoch)
2D ClassificationCOCO minivalAP7555.8DINO-5scale (36 epoch)
2D ClassificationCOCO minivalAPL65.3DINO-5scale (36 epoch)
2D ClassificationCOCO minivalAPM54.3DINO-5scale (36 epoch)
2D ClassificationCOCO minivalAPS35DINO-5scale (36 epoch)
2D ClassificationCOCO minivalbox AP51.2DINO-5scale (36 epoch)
2D Object DetectionCOCO test-devbox mAP63.3DINO (Swin-L,multi-scale, TTA)
2D Object DetectionCOCO-OAverage mAP42.1DINO (Swin-L)
2D Object DetectionCOCO-OEffective Robustness15.76DINO (Swin-L)
2D Object DetectionSA-Det-100kAP43.7DINO (ResNet50 1x VFL)
2D Object DetectionSA-Det-100kAP5052DINO (ResNet50 1x VFL)
2D Object DetectionSA-Det-100kAP7547.7DINO (ResNet50 1x VFL)
2D Object DetectionSA-Det-100kAPL61.5DINO (ResNet50 1x VFL)
2D Object DetectionSA-Det-100kAPM43DINO (ResNet50 1x VFL)
2D Object DetectionSA-Det-100kAPS5.8DINO (ResNet50 1x VFL)
2D Object DetectionCOCO minivalbox AP63.2DINO (Swin-L)
2D Object DetectionCOCO minivalAP5069.1DINO-5scale (24 epoch)
2D Object DetectionCOCO minivalAP7556DINO-5scale (24 epoch)
2D Object DetectionCOCO minivalAPL65.8DINO-5scale (24 epoch)
2D Object DetectionCOCO minivalAPM54.2DINO-5scale (24 epoch)
2D Object DetectionCOCO minivalAPS34.5DINO-5scale (24 epoch)
2D Object DetectionCOCO minivalbox AP51.3DINO-5scale (24 epoch)
2D Object DetectionCOCO minivalAP5069DINO-5scale (36 epoch)
2D Object DetectionCOCO minivalAP7555.8DINO-5scale (36 epoch)
2D Object DetectionCOCO minivalAPL65.3DINO-5scale (36 epoch)
2D Object DetectionCOCO minivalAPM54.3DINO-5scale (36 epoch)
2D Object DetectionCOCO minivalAPS35DINO-5scale (36 epoch)
2D Object DetectionCOCO minivalbox AP51.2DINO-5scale (36 epoch)
16kCOCO test-devbox mAP63.3DINO (Swin-L,multi-scale, TTA)
16kCOCO-OAverage mAP42.1DINO (Swin-L)
16kCOCO-OEffective Robustness15.76DINO (Swin-L)
16kSA-Det-100kAP43.7DINO (ResNet50 1x VFL)
16kSA-Det-100kAP5052DINO (ResNet50 1x VFL)
16kSA-Det-100kAP7547.7DINO (ResNet50 1x VFL)
16kSA-Det-100kAPL61.5DINO (ResNet50 1x VFL)
16kSA-Det-100kAPM43DINO (ResNet50 1x VFL)
16kSA-Det-100kAPS5.8DINO (ResNet50 1x VFL)
16kCOCO minivalbox AP63.2DINO (Swin-L)
16kCOCO minivalAP5069.1DINO-5scale (24 epoch)
16kCOCO minivalAP7556DINO-5scale (24 epoch)
16kCOCO minivalAPL65.8DINO-5scale (24 epoch)
16kCOCO minivalAPM54.2DINO-5scale (24 epoch)
16kCOCO minivalAPS34.5DINO-5scale (24 epoch)
16kCOCO minivalbox AP51.3DINO-5scale (24 epoch)
16kCOCO minivalAP5069DINO-5scale (36 epoch)
16kCOCO minivalAP7555.8DINO-5scale (36 epoch)
16kCOCO minivalAPL65.3DINO-5scale (36 epoch)
16kCOCO minivalAPM54.3DINO-5scale (36 epoch)
16kCOCO minivalAPS35DINO-5scale (36 epoch)
16kCOCO minivalbox AP51.2DINO-5scale (36 epoch)

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07