TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DETRs with Collaborative Hybrid Assignments Training

DETRs with Collaborative Hybrid Assignments Training

Zhuofan Zong, Guanglu Song, Yu Liu

2022-11-22ICCV 2023 1Instance SegmentationObject Detection
PaperPDFCodeCodeCodeCodeCodeCode

Abstract

In this paper, we provide the observation that too few queries assigned as positive samples in DETR with one-to-one set matching leads to sparse supervision on the encoder's output which considerably hurt the discriminative feature learning of the encoder and vice visa for attention learning in the decoder. To alleviate this, we present a novel collaborative hybrid assignments training scheme, namely $\mathcal{C}$o-DETR, to learn more efficient and effective DETR-based detectors from versatile label assignment manners. This new training scheme can easily enhance the encoder's learning ability in end-to-end detectors by training the multiple parallel auxiliary heads supervised by one-to-many label assignments such as ATSS and Faster RCNN. In addition, we conduct extra customized positive queries by extracting the positive coordinates from these auxiliary heads to improve the training efficiency of positive samples in the decoder. In inference, these auxiliary heads are discarded and thus our method introduces no additional parameters and computational cost to the original detector while requiring no hand-crafted non-maximum suppression (NMS). We conduct extensive experiments to evaluate the effectiveness of the proposed approach on DETR variants, including DAB-DETR, Deformable-DETR, and DINO-Deformable-DETR. The state-of-the-art DINO-Deformable-DETR with Swin-L can be improved from 58.5% to 59.5% AP on COCO val. Surprisingly, incorporated with ViT-L backbone, we achieve 66.0% AP on COCO test-dev and 67.9% AP on LVIS val, outperforming previous methods by clear margins with much fewer model sizes. Codes are available at \url{https://github.com/Sense-X/Co-DETR}.

Results

TaskDatasetMetricValueModel
Object DetectionLVIS v1.0 minivalbox AP72Co-DETR (single-scale)
Object DetectionCOCO test-devParams (M)304Co-DETR
Object DetectionCOCO test-devbox mAP66Co-DETR
Object DetectionCOCO test-devParams (M)218Co-DETR (Swin-L)
Object DetectionCOCO test-devbox mAP64.8Co-DETR (Swin-L)
Object DetectionCOCO minivalParams (M)314Co-DETR
Object DetectionCOCO minivalbox AP65.9Co-DETR
Object DetectionCOCO minivalParams (M)218Co-DETR (Swin-L)
Object DetectionCOCO minivalbox AP64.7Co-DETR (Swin-L)
Object DetectionLVIS v1.0 valbox AP68Co-DETR (single-scale)
3DLVIS v1.0 minivalbox AP72Co-DETR (single-scale)
3DCOCO test-devParams (M)304Co-DETR
3DCOCO test-devbox mAP66Co-DETR
3DCOCO test-devParams (M)218Co-DETR (Swin-L)
3DCOCO test-devbox mAP64.8Co-DETR (Swin-L)
3DCOCO minivalParams (M)314Co-DETR
3DCOCO minivalbox AP65.9Co-DETR
3DCOCO minivalParams (M)218Co-DETR (Swin-L)
3DCOCO minivalbox AP64.7Co-DETR (Swin-L)
3DLVIS v1.0 valbox AP68Co-DETR (single-scale)
Instance SegmentationCOCO minivalAP5079.7Co-DETR
Instance SegmentationCOCO minivalAP7562.8Co-DETR
Instance SegmentationCOCO minivalAPL74.6Co-DETR
Instance SegmentationCOCO minivalAPM59.7Co-DETR
Instance SegmentationCOCO minivalAPS38.9Co-DETR
Instance SegmentationCOCO minivalmask AP56.6Co-DETR
Instance SegmentationCOCO test-devAP5080.2Co-DETR
Instance SegmentationCOCO test-devAP7563.4Co-DETR
Instance SegmentationCOCO test-devAPL72Co-DETR
Instance SegmentationCOCO test-devAPM60.1Co-DETR
Instance SegmentationCOCO test-devAPS41.6Co-DETR
Instance SegmentationCOCO test-devmask AP57.1Co-DETR
Instance SegmentationLVIS v1.0 valmask AP60.7Co-DETR (single-scale)
2D ClassificationLVIS v1.0 minivalbox AP72Co-DETR (single-scale)
2D ClassificationCOCO test-devParams (M)304Co-DETR
2D ClassificationCOCO test-devbox mAP66Co-DETR
2D ClassificationCOCO test-devParams (M)218Co-DETR (Swin-L)
2D ClassificationCOCO test-devbox mAP64.8Co-DETR (Swin-L)
2D ClassificationCOCO minivalParams (M)314Co-DETR
2D ClassificationCOCO minivalbox AP65.9Co-DETR
2D ClassificationCOCO minivalParams (M)218Co-DETR (Swin-L)
2D ClassificationCOCO minivalbox AP64.7Co-DETR (Swin-L)
2D ClassificationLVIS v1.0 valbox AP68Co-DETR (single-scale)
2D Object DetectionLVIS v1.0 minivalbox AP72Co-DETR (single-scale)
2D Object DetectionCOCO test-devParams (M)304Co-DETR
2D Object DetectionCOCO test-devbox mAP66Co-DETR
2D Object DetectionCOCO test-devParams (M)218Co-DETR (Swin-L)
2D Object DetectionCOCO test-devbox mAP64.8Co-DETR (Swin-L)
2D Object DetectionCOCO minivalParams (M)314Co-DETR
2D Object DetectionCOCO minivalbox AP65.9Co-DETR
2D Object DetectionCOCO minivalParams (M)218Co-DETR (Swin-L)
2D Object DetectionCOCO minivalbox AP64.7Co-DETR (Swin-L)
2D Object DetectionLVIS v1.0 valbox AP68Co-DETR (single-scale)
16kLVIS v1.0 minivalbox AP72Co-DETR (single-scale)
16kCOCO test-devParams (M)304Co-DETR
16kCOCO test-devbox mAP66Co-DETR
16kCOCO test-devParams (M)218Co-DETR (Swin-L)
16kCOCO test-devbox mAP64.8Co-DETR (Swin-L)
16kCOCO minivalParams (M)314Co-DETR
16kCOCO minivalbox AP65.9Co-DETR
16kCOCO minivalParams (M)218Co-DETR (Swin-L)
16kCOCO minivalbox AP64.7Co-DETR (Swin-L)
16kLVIS v1.0 valbox AP68Co-DETR (single-scale)

Related Papers

SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15Beyond Appearance: Geometric Cues for Robust Video Instance Segmentation2025-07-08