TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Open Vocabulary Object Detection with Proposal Mining and ...

Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization

Peixian Chen, Kekai Sheng, Mengdan Zhang, Mingbao Lin, Yunhang Shen, Shaohui Lin, Bo Ren, Ke Li

2022-06-22Causal InferenceOpen Vocabulary Object Detectionobject-detectionObject Detection
PaperPDFCode(official)Code(official)

Abstract

Open-vocabulary object detection (OVD) aims to scale up vocabulary size to detect objects of novel categories beyond the training vocabulary. Recent work resorts to the rich knowledge in pre-trained vision-language models. However, existing methods are ineffective in proposal-level vision-language alignment. Meanwhile, the models usually suffer from confidence bias toward base categories and perform worse on novel ones. To overcome the challenges, we present MEDet, a novel and effective OVD framework with proposal mining and prediction equalization. First, we design an online proposal mining to refine the inherited vision-semantic knowledge from coarse to fine, allowing for proposal-level detection-oriented feature alignment. Second, based on causal inference theory, we introduce a class-wise backdoor adjustment to reinforce the predictions on novel categories to improve the overall OVD performance. Extensive experiments on COCO and LVIS benchmarks verify the superiority of MEDet over the competing approaches in detecting objects of novel categories, e.g., 32.6% AP50 on COCO and 22.4% mask mAP on LVIS.

Results

TaskDatasetMetricValueModel
Object DetectionLVIS v1.0AP novel-LVIS base training22.4MEDet
Object DetectionMSCOCOAP 0.532.6MEDet (RN50)
3DLVIS v1.0AP novel-LVIS base training22.4MEDet
3DMSCOCOAP 0.532.6MEDet (RN50)
2D ClassificationLVIS v1.0AP novel-LVIS base training22.4MEDet
2D ClassificationMSCOCOAP 0.532.6MEDet (RN50)
2D Object DetectionLVIS v1.0AP novel-LVIS base training22.4MEDet
2D Object DetectionMSCOCOAP 0.532.6MEDet (RN50)
Open Vocabulary Object DetectionLVIS v1.0AP novel-LVIS base training22.4MEDet
Open Vocabulary Object DetectionMSCOCOAP 0.532.6MEDet (RN50)
16kLVIS v1.0AP novel-LVIS base training22.4MEDet
16kMSCOCOAP 0.532.6MEDet (RN50)

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Estimating Interventional Distributions with Uncertain Causal Graphs through Meta-Learning2025-07-07