TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BoxMask: Revisiting Bounding Box Supervision for Video Obj...

BoxMask: Revisiting Bounding Box Supervision for Video Object Detection

Khurram Azeem Hashmi, Alain Pagani, Didier Stricker, Muhammamd Zeshan Afzal

2022-10-12Video Object Detectionobject-detectionObject Detection
PaperPDF

Abstract

We present a new, simple yet effective approach to uplift video object detection. We observe that prior works operate on instance-level feature aggregation that imminently neglects the refined pixel-level representation, resulting in confusion among objects sharing similar appearance or motion characteristics. To address this limitation, we propose BoxMask, which effectively learns discriminative representations by incorporating class-aware pixel-level information. We simply consider bounding box-level annotations as a coarse mask for each object to supervise our method. The proposed module can be effortlessly integrated into any region-based detector to boost detection. Extensive experiments on ImageNet VID and EPIC KITCHENS datasets demonstrate consistent and significant improvement when we plug our BoxMask module into numerous recent state-of-the-art methods.

Results

TaskDatasetMetricValueModel
Object DetectionImageNet VIDMAP 84.8BoxMask(ResNeXt101)
Object DetectionImageNet VIDMAP 80.7BoxMask (ResNet-50)
3DImageNet VIDMAP 84.8BoxMask(ResNeXt101)
3DImageNet VIDMAP 80.7BoxMask (ResNet-50)
2D ClassificationImageNet VIDMAP 84.8BoxMask(ResNeXt101)
2D ClassificationImageNet VIDMAP 80.7BoxMask (ResNet-50)
2D Object DetectionImageNet VIDMAP 84.8BoxMask(ResNeXt101)
2D Object DetectionImageNet VIDMAP 80.7BoxMask (ResNet-50)
16kImageNet VIDMAP 84.8BoxMask(ResNeXt101)
16kImageNet VIDMAP 80.7BoxMask (ResNet-50)

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07