TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Re...

SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary Detection

Zishuo Wang, Wenhao Zhou, Jinglin Xu, Yuxin Peng

2024-10-08Open World Object DetectionOpen Vocabulary Object Detectionobject-detectionObject Detection
PaperPDFCode(official)

Abstract

Open-vocabulary detection (OVD) aims to detect novel objects without instance-level annotations to achieve open-world object detection at a lower cost. Existing OVD methods mainly rely on the powerful open-vocabulary image-text alignment capability of Vision-Language Pretrained Models (VLM) such as CLIP. However, CLIP is trained on image-text pairs and lacks the perceptual ability for local regions within an image, resulting in the gap between image and region representations. Directly using CLIP for OVD causes inaccurate region classification. We find the image-region gap is primarily caused by the deformation of region feature maps during region of interest (RoI) extraction. To mitigate the inaccurate region classification in OVD, we propose a new Shape-Invariant Adapter named SIA-OVD to bridge the image-region gap in the OVD task. SIA-OVD learns a set of feature adapters for regions with different shapes and designs a new adapter allocation mechanism to select the optimal adapter for each region. The adapted region representations can align better with text representations learned by CLIP. Extensive experiments demonstrate that SIA-OVD effectively improves the classification accuracy for regions by addressing the gap between images and regions caused by shape deformation. SIA-OVD achieves substantial improvements over representative methods on the COCO-OVD benchmark. The code is available at https://github.com/PKU-ICST-MIPL/SIA-OVD_ACMMM2024.

Results

TaskDatasetMetricValueModel
Object DetectionMSCOCOAP 0.541.9SIA-OVD (RN50x4)
Object DetectionMSCOCOAP 0.535.5SIA-OVD (RN50)
3DMSCOCOAP 0.541.9SIA-OVD (RN50x4)
3DMSCOCOAP 0.535.5SIA-OVD (RN50)
2D ClassificationMSCOCOAP 0.541.9SIA-OVD (RN50x4)
2D ClassificationMSCOCOAP 0.535.5SIA-OVD (RN50)
2D Object DetectionMSCOCOAP 0.541.9SIA-OVD (RN50x4)
2D Object DetectionMSCOCOAP 0.535.5SIA-OVD (RN50)
Open Vocabulary Object DetectionMSCOCOAP 0.541.9SIA-OVD (RN50x4)
Open Vocabulary Object DetectionMSCOCOAP 0.535.5SIA-OVD (RN50)
16kMSCOCOAP 0.541.9SIA-OVD (RN50x4)
16kMSCOCOAP 0.535.5SIA-OVD (RN50)

Related Papers

Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07