TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Dense Distinct Query for End-to-End Object Detection

Dense Distinct Query for End-to-End Object Detection

Shilong Zhang, Xinjiang Wang, Jiaqi Wang, Jiangmiao Pang, Chengqi Lyu, Wenwei Zhang, Ping Luo, Kai Chen

2023-03-22CVPR 2023 1object-detectionObject Detection
PaperPDFCodeCode(official)

Abstract

One-to-one label assignment in object detection has successfully obviated the need for non-maximum suppression (NMS) as postprocessing and makes the pipeline end-to-end. However, it triggers a new dilemma as the widely used sparse queries cannot guarantee a high recall, while dense queries inevitably bring more similar queries and encounter optimization difficulties. As both sparse and dense queries are problematic, then what are the expected queries in end-to-end object detection? This paper shows that the solution should be Dense Distinct Queries (DDQ). Concretely, we first lay dense queries like traditional detectors and then select distinct ones for one-to-one assignments. DDQ blends the advantages of traditional and recent end-to-end detectors and significantly improves the performance of various detectors including FCN, R-CNN, and DETRs. Most impressively, DDQ-DETR achieves 52.1 AP on MS-COCO dataset within 12 epochs using a ResNet-50 backbone, outperforming all existing detectors in the same setting. DDQ also shares the benefit of end-to-end detectors in crowded scenes and achieves 93.8 AP on CrowdHuman. We hope DDQ can inspire researchers to consider the complementarity between traditional methods and end-to-end detectors. The source code can be found at \url{https://github.com/jshilong/DDQ}.

Results

TaskDatasetMetricValueModel
Object DetectionCrowdHuman (full body)AP93.8DDQ DETR (R50)
Object DetectionCrowdHuman (full body)Recall98.7DDQ DETR (R50)
Object DetectionCrowdHuman (full body)mMR39.7DDQ DETR (R50)
Object DetectionCrowdHuman (full body)AP93.5DDQ R-CNN (R50)
Object DetectionCrowdHuman (full body)Recall98.6DDQ R-CNN (R50)
Object DetectionCrowdHuman (full body)mMR40.4DDQ R-CNN (R50)
Object DetectionCrowdHuman (full body)AP92.7DDQ FCN (R50 One-Stage)
Object DetectionCrowdHuman (full body)Recall98.2DDQ FCN (R50 One-Stage)
Object DetectionCrowdHuman (full body)mMR41DDQ FCN (R50 One-Stage)
3DCrowdHuman (full body)AP93.8DDQ DETR (R50)
3DCrowdHuman (full body)Recall98.7DDQ DETR (R50)
3DCrowdHuman (full body)mMR39.7DDQ DETR (R50)
3DCrowdHuman (full body)AP93.5DDQ R-CNN (R50)
3DCrowdHuman (full body)Recall98.6DDQ R-CNN (R50)
3DCrowdHuman (full body)mMR40.4DDQ R-CNN (R50)
3DCrowdHuman (full body)AP92.7DDQ FCN (R50 One-Stage)
3DCrowdHuman (full body)Recall98.2DDQ FCN (R50 One-Stage)
3DCrowdHuman (full body)mMR41DDQ FCN (R50 One-Stage)
2D ClassificationCrowdHuman (full body)AP93.8DDQ DETR (R50)
2D ClassificationCrowdHuman (full body)Recall98.7DDQ DETR (R50)
2D ClassificationCrowdHuman (full body)mMR39.7DDQ DETR (R50)
2D ClassificationCrowdHuman (full body)AP93.5DDQ R-CNN (R50)
2D ClassificationCrowdHuman (full body)Recall98.6DDQ R-CNN (R50)
2D ClassificationCrowdHuman (full body)mMR40.4DDQ R-CNN (R50)
2D ClassificationCrowdHuman (full body)AP92.7DDQ FCN (R50 One-Stage)
2D ClassificationCrowdHuman (full body)Recall98.2DDQ FCN (R50 One-Stage)
2D ClassificationCrowdHuman (full body)mMR41DDQ FCN (R50 One-Stage)
2D Object DetectionCrowdHuman (full body)AP93.8DDQ DETR (R50)
2D Object DetectionCrowdHuman (full body)Recall98.7DDQ DETR (R50)
2D Object DetectionCrowdHuman (full body)mMR39.7DDQ DETR (R50)
2D Object DetectionCrowdHuman (full body)AP93.5DDQ R-CNN (R50)
2D Object DetectionCrowdHuman (full body)Recall98.6DDQ R-CNN (R50)
2D Object DetectionCrowdHuman (full body)mMR40.4DDQ R-CNN (R50)
2D Object DetectionCrowdHuman (full body)AP92.7DDQ FCN (R50 One-Stage)
2D Object DetectionCrowdHuman (full body)Recall98.2DDQ FCN (R50 One-Stage)
2D Object DetectionCrowdHuman (full body)mMR41DDQ FCN (R50 One-Stage)
16kCrowdHuman (full body)AP93.8DDQ DETR (R50)
16kCrowdHuman (full body)Recall98.7DDQ DETR (R50)
16kCrowdHuman (full body)mMR39.7DDQ DETR (R50)
16kCrowdHuman (full body)AP93.5DDQ R-CNN (R50)
16kCrowdHuman (full body)Recall98.6DDQ R-CNN (R50)
16kCrowdHuman (full body)mMR40.4DDQ R-CNN (R50)
16kCrowdHuman (full body)AP92.7DDQ FCN (R50 One-Stage)
16kCrowdHuman (full body)Recall98.2DDQ FCN (R50 One-Stage)
16kCrowdHuman (full body)mMR41DDQ FCN (R50 One-Stage)

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07