TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Described Object Detection: Liberating Object Detection wi...

Described Object Detection: Liberating Object Detection with Flexible Expressions

Chi Xie, Zhao Zhang, Yixuan Wu, Feng Zhu, Rui Zhao, Shuang Liang

2023-07-24NeurIPS 2023 11Described Object DetectionReferring ExpressionBinary ClassificationReferring Expression ComprehensionOpen Vocabulary Object Detectionobject-detectionObject Detection
PaperPDFCode(official)

Abstract

Detecting objects based on language information is a popular task that includes Open-Vocabulary object Detection (OVD) and Referring Expression Comprehension (REC). In this paper, we advance them to a more practical setting called Described Object Detection (DOD) by expanding category names to flexible language expressions for OVD and overcoming the limitation of REC only grounding the pre-existing object. We establish the research foundation for DOD by constructing a Description Detection Dataset ($D^3$). This dataset features flexible language expressions, whether short category names or long descriptions, and annotating all described objects on all images without omission. By evaluating previous SOTA methods on $D^3$, we find some troublemakers that fail current REC, OVD, and bi-functional methods. REC methods struggle with confidence scores, rejecting negative instances, and multi-target scenarios, while OVD methods face constraints with long and complex descriptions. Recent bi-functional methods also do not work well on DOD due to their separated training procedures and inference strategies for REC and OVD tasks. Building upon the aforementioned findings, we propose a baseline that largely improves REC methods by reconstructing the training data and introducing a binary classification sub-task, outperforming existing methods. Data and code are available at https://github.com/shikras/d-cube and related works are tracked in https://github.com/Charles-Xie/awesome-described-object-detection.

Results

TaskDatasetMetricValueModel
Object DetectionDescription Detection DatasetIntra-scenario ABS mAP15.4OFA-DOD-base
Object DetectionDescription Detection DatasetIntra-scenario FULL mAP21.6OFA-DOD-base
Object DetectionDescription Detection DatasetIntra-scenario PRES mAP23.7OFA-DOD-base
3DDescription Detection DatasetIntra-scenario ABS mAP15.4OFA-DOD-base
3DDescription Detection DatasetIntra-scenario FULL mAP21.6OFA-DOD-base
3DDescription Detection DatasetIntra-scenario PRES mAP23.7OFA-DOD-base
2D ClassificationDescription Detection DatasetIntra-scenario ABS mAP15.4OFA-DOD-base
2D ClassificationDescription Detection DatasetIntra-scenario FULL mAP21.6OFA-DOD-base
2D ClassificationDescription Detection DatasetIntra-scenario PRES mAP23.7OFA-DOD-base
2D Object DetectionDescription Detection DatasetIntra-scenario ABS mAP15.4OFA-DOD-base
2D Object DetectionDescription Detection DatasetIntra-scenario FULL mAP21.6OFA-DOD-base
2D Object DetectionDescription Detection DatasetIntra-scenario PRES mAP23.7OFA-DOD-base
16kDescription Detection DatasetIntra-scenario ABS mAP15.4OFA-DOD-base
16kDescription Detection DatasetIntra-scenario FULL mAP21.6OFA-DOD-base
16kDescription Detection DatasetIntra-scenario PRES mAP23.7OFA-DOD-base

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15An Automated Classifier of Harmful Brain Activities for Clinical Usage Based on a Vision-Inspired Pre-trained Framework2025-07-10ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08