TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Tr...

OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision

Junjie Wang, Bin Chen, Bin Kang, Yulin Li, YiChi Chen, Weizhi Xian, Huifeng Chang, Yong Xu

2024-05-28DenoisingZero-Shot Object DetectionContrastive LearningOpen Vocabulary Object Detectionobject-detectionObject Detection
PaperPDFCode(official)

Abstract

Open-vocabulary detection aims to detect objects from novel categories beyond the base categories on which the detector is trained. However, existing open-vocabulary detectors trained on base category data tend to assign higher confidence to trained categories and confuse novel categories with the background. To resolve this, we propose OV-DQUO, an \textbf{O}pen-\textbf{V}ocabulary DETR with \textbf{D}enoising text \textbf{Q}uery training and open-world \textbf{U}nknown \textbf{O}bjects supervision. Specifically, we introduce a wildcard matching method. This method enables the detector to learn from pairs of unknown objects recognized by the open-world detector and text embeddings with general semantics, mitigating the confidence bias between base and novel categories. Additionally, we propose a denoising text query training strategy. It synthesizes foreground and background query-box pairs from open-world unknown objects to train the detector through contrastive learning, enhancing its ability to distinguish novel objects from the background. We conducted extensive experiments on the challenging OV-COCO and OV-LVIS benchmarks, achieving new state-of-the-art results of 45.6 AP50 and 39.3 mAP on novel categories respectively, without the need for additional training data. Models and code are released at \url{https://github.com/xiaomoguhz/OV-DQUO}

Results

TaskDatasetMetricValueModel
Object DetectionLVIS v1.0AP novel-LVIS base training39.3OV-DQUO(ViT-L/14)
Object DetectionLVIS v1.0AP novel-LVIS base training29.7OV-DQUO(ViT-B/16)
Object DetectionMSCOCOAP 0.545.6OV-DQUO(RN50x4)
Object DetectionMSCOCOAP 0.539.2OV-DQUO(R50)
3DLVIS v1.0AP novel-LVIS base training39.3OV-DQUO(ViT-L/14)
3DLVIS v1.0AP novel-LVIS base training29.7OV-DQUO(ViT-B/16)
3DMSCOCOAP 0.545.6OV-DQUO(RN50x4)
3DMSCOCOAP 0.539.2OV-DQUO(R50)
2D ClassificationLVIS v1.0AP novel-LVIS base training39.3OV-DQUO(ViT-L/14)
2D ClassificationLVIS v1.0AP novel-LVIS base training29.7OV-DQUO(ViT-B/16)
2D ClassificationMSCOCOAP 0.545.6OV-DQUO(RN50x4)
2D ClassificationMSCOCOAP 0.539.2OV-DQUO(R50)
2D Object DetectionLVIS v1.0AP novel-LVIS base training39.3OV-DQUO(ViT-L/14)
2D Object DetectionLVIS v1.0AP novel-LVIS base training29.7OV-DQUO(ViT-B/16)
2D Object DetectionMSCOCOAP 0.545.6OV-DQUO(RN50x4)
2D Object DetectionMSCOCOAP 0.539.2OV-DQUO(R50)
Open Vocabulary Object DetectionLVIS v1.0AP novel-LVIS base training39.3OV-DQUO(ViT-L/14)
Open Vocabulary Object DetectionLVIS v1.0AP novel-LVIS base training29.7OV-DQUO(ViT-B/16)
Open Vocabulary Object DetectionMSCOCOAP 0.545.6OV-DQUO(RN50x4)
Open Vocabulary Object DetectionMSCOCOAP 0.539.2OV-DQUO(R50)
16kLVIS v1.0AP novel-LVIS base training39.3OV-DQUO(ViT-L/14)
16kLVIS v1.0AP novel-LVIS base training29.7OV-DQUO(ViT-B/16)
16kMSCOCOAP 0.545.6OV-DQUO(RN50x4)
16kMSCOCOAP 0.539.2OV-DQUO(R50)

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17