TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DEIM: DETR with Improved Matching for Fast Convergence

DEIM: DETR with Improved Matching for Fast Convergence

Shihua Huang, Zhichao Lu, Xiaodong Cun, Yongjun Yu, Xiao Zhou, Xi Shen

2024-12-05CVPR 2025 1Data AugmentationReal-Time Object Detectionobject-detectionObject Detection
PaperPDFCode(official)

Abstract

We introduce DEIM, an innovative and efficient training framework designed to accelerate convergence in real-time object detection with Transformer-based architectures (DETR). To mitigate the sparse supervision inherent in one-to-one (O2O) matching in DETR models, DEIM employs a Dense O2O matching strategy. This approach increases the number of positive samples per image by incorporating additional targets, using standard data augmentation techniques. While Dense O2O matching speeds up convergence, it also introduces numerous low-quality matches that could affect performance. To address this, we propose the Matchability-Aware Loss (MAL), a novel loss function that optimizes matches across various quality levels, enhancing the effectiveness of Dense O2O. Extensive experiments on the COCO dataset validate the efficacy of DEIM. When integrated with RT-DETR and D-FINE, it consistently boosts performance while reducing training time by 50%. Notably, paired with RT-DETRv2, DEIM achieves 53.2% AP in a single day of training on an NVIDIA 4090 GPU. Additionally, DEIM-trained real-time models outperform leading real-time object detectors, with DEIM-D-FINE-L and DEIM-D-FINE-X achieving 54.7% and 56.5% AP at 124 and 78 FPS on an NVIDIA T4 GPU, respectively, without the need for additional data. We believe DEIM sets a new baseline for advancements in real-time object detection. Our code and pre-trained models are available at https://github.com/ShihuaHuang95/DEIM.

Results

TaskDatasetMetricValueModel
Object DetectionCOCO (Common Objects in Context)box AP59.5DEIM-D-FINE-X+
Object DetectionCOCO (Common Objects in Context)box AP56.5DEIM-D-FINE-X
Object DetectionCOCO (Common Objects in Context)box AP54.7DEIM-D-FINE-L
Object DetectionCOCO (Common Objects in Context)box AP52.7DEIM-D-FINE-M
Object DetectionCOCO (Common Objects in Context)box AP49DEIM-D-FINE-S
3DCOCO (Common Objects in Context)box AP59.5DEIM-D-FINE-X+
3DCOCO (Common Objects in Context)box AP56.5DEIM-D-FINE-X
3DCOCO (Common Objects in Context)box AP54.7DEIM-D-FINE-L
3DCOCO (Common Objects in Context)box AP52.7DEIM-D-FINE-M
3DCOCO (Common Objects in Context)box AP49DEIM-D-FINE-S
2D ClassificationCOCO (Common Objects in Context)box AP59.5DEIM-D-FINE-X+
2D ClassificationCOCO (Common Objects in Context)box AP56.5DEIM-D-FINE-X
2D ClassificationCOCO (Common Objects in Context)box AP54.7DEIM-D-FINE-L
2D ClassificationCOCO (Common Objects in Context)box AP52.7DEIM-D-FINE-M
2D ClassificationCOCO (Common Objects in Context)box AP49DEIM-D-FINE-S
2D Object DetectionCOCO (Common Objects in Context)box AP59.5DEIM-D-FINE-X+
2D Object DetectionCOCO (Common Objects in Context)box AP56.5DEIM-D-FINE-X
2D Object DetectionCOCO (Common Objects in Context)box AP54.7DEIM-D-FINE-L
2D Object DetectionCOCO (Common Objects in Context)box AP52.7DEIM-D-FINE-M
2D Object DetectionCOCO (Common Objects in Context)box AP49DEIM-D-FINE-S
16kCOCO (Common Objects in Context)box AP59.5DEIM-D-FINE-X+
16kCOCO (Common Objects in Context)box AP56.5DEIM-D-FINE-X
16kCOCO (Common Objects in Context)box AP54.7DEIM-D-FINE-L
16kCOCO (Common Objects in Context)box AP52.7DEIM-D-FINE-M
16kCOCO (Common Objects in Context)box AP49DEIM-D-FINE-S

Related Papers

Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16