Enhancing Novel Object Detection via Cooperative Foundational Models

Rohit Bharadwaj, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

2023-11-19Open Vocabulary Object Detection Novel Object Detection object-detection Novel Class Discovery Object Detection

Paper PDF Code(official)

Abstract

In this work, we address the challenging and emergent problem of novel object detection (NOD), focusing on the accurate detection of both known and novel object categories during inference. Traditional object detection algorithms are inherently closed-set, limiting their capability to handle NOD. We present a novel approach to transform existing closed-set detectors into open-set detectors. This transformation is achieved by leveraging the complementary strengths of pre-trained foundational models, specifically CLIP and SAM, through our cooperative mechanism. Furthermore, by integrating this mechanism with state-of-the-art open-set detectors such as GDINO, we establish new benchmarks in object detection performance. Our method achieves 17.42 mAP in novel object detection and 42.08 mAP for known objects on the challenging LVIS dataset. Adapting our approach to the COCO OVD split, we surpass the current state-of-the-art by a margin of 7.2 $ \text{AP}_{50} $ for novel classes. Our code is available at https://rohit901.github.io/coop-foundation-models/ .

Results

Task	Dataset	Metric	Value	Model
Object Detection	MSCOCO	AP 0.5	50.3	Cooperative Foundational Models
3D	MSCOCO	AP 0.5	50.3	Cooperative Foundational Models
2D Classification	MSCOCO	AP 0.5	50.3	Cooperative Foundational Models
2D Object Detection	MSCOCO	AP 0.5	50.3	Cooperative Foundational Models
2D Object Detection	LVIS v1.0 val	All mAP	19.33	Cooperative Foundational Models
2D Object Detection	LVIS v1.0 val	Known mAP	42.08	Cooperative Foundational Models
2D Object Detection	LVIS v1.0 val	Novel mAP	17.42	Cooperative Foundational Models
Open Vocabulary Object Detection	MSCOCO	AP 0.5	50.3	Cooperative Foundational Models
16k	MSCOCO	AP 0.5	50.3	Cooperative Foundational Models

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17 RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17 Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17 Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17 Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16 Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15 ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08 Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07