TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning from Rich Semantics and Coarse Locations for Long...

Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection

2023-10-18NeurIPS 2023 11Long-tailed Object Detectionobject-detectionObject Detection
PaperPDFCode(official)

Abstract

Long-tailed object detection (LTOD) aims to handle the extreme data imbalance in real-world datasets, where many tail classes have scarce instances. One popular strategy is to explore extra data with image-level labels, yet it produces limited results due to (1) semantic ambiguity -- an image-level label only captures a salient part of the image, ignoring the remaining rich semantics within the image; and (2) location sensitivity -- the label highly depends on the locations and crops of the original image, which may change after data transformations like random cropping. To remedy this, we propose RichSem, a simple but effective method, which is robust to learn rich semantics from coarse locations without the need of accurate bounding boxes. RichSem leverages rich semantics from images, which are then served as additional soft supervision for training detectors. Specifically, we add a semantic branch to our detector to learn these soft semantics and enhance feature representations for long-tailed object detection. The semantic branch is only used for training and is removed during inference. RichSem achieves consistent improvements on both overall and rare-category of LVIS under different backbones and detectors. Our method achieves state-of-the-art performance without requiring complex training and testing procedures. Moreover, we show the effectiveness of our method on other long-tailed datasets with additional experiments. Code is available at \url{https://github.com/MengLcool/RichSem}.

Results

TaskDatasetMetricValueModel
Object DetectionLVIS v1.0 valbox AP61.2RichSem (Focal-H + ImageNet as weakly-supervised extra data)
Object DetectionLVIS v1.0 valbox APr61.2RichSem (Focal-H + ImageNet as weakly-supervised extra data)
3DLVIS v1.0 valbox AP61.2RichSem (Focal-H + ImageNet as weakly-supervised extra data)
3DLVIS v1.0 valbox APr61.2RichSem (Focal-H + ImageNet as weakly-supervised extra data)
2D ClassificationLVIS v1.0 valbox AP61.2RichSem (Focal-H + ImageNet as weakly-supervised extra data)
2D ClassificationLVIS v1.0 valbox APr61.2RichSem (Focal-H + ImageNet as weakly-supervised extra data)
2D Object DetectionLVIS v1.0 valbox AP61.2RichSem (Focal-H + ImageNet as weakly-supervised extra data)
2D Object DetectionLVIS v1.0 valbox APr61.2RichSem (Focal-H + ImageNet as weakly-supervised extra data)
16kLVIS v1.0 valbox AP61.2RichSem (Focal-H + ImageNet as weakly-supervised extra data)
16kLVIS v1.0 valbox APr61.2RichSem (Focal-H + ImageNet as weakly-supervised extra data)

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07