TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MV-DETR: Multi-modality indoor object detection by Multi-V...

MV-DETR: Multi-modality indoor object detection by Multi-View DEtecton TRansformers

ZiChao Dong, Yilin Zhang, Xufeng Huang, Hang Ji, Zhan Shi, Xin Zhan, Junbo Chen

2024-08-13object-detection3D Object DetectionObject Detection
PaperPDF

Abstract

We introduce a novel MV-DETR pipeline which is effective while efficient transformer based detection method. Given input RGBD data, we notice that there are super strong pretraining weights for RGB data while less effective works for depth related data. First and foremost , we argue that geometry and texture cues are both of vital importance while could be encoded separately. Secondly, we find that visual texture feature is relatively hard to extract compared with geometry feature in 3d space. Unfortunately, single RGBD dataset with thousands of data is not enough for training an discriminating filter for visual texture feature extraction. Last but certainly not the least, we designed a lightweight VG module consists of a visual textual encoder, a geometry encoder and a VG connector. Compared with previous state of the art works like V-DETR, gains from pretrained visual encoder could be seen. Extensive experiments on ScanNetV2 dataset shows the effectiveness of our method. It is worth mentioned that our method achieve 78\% AP which create new state of the art on ScanNetv2 benchmark.

Results

TaskDatasetMetricValueModel
Object DetectionScanNetV2mAP@0.2578UDeerMvDETR
Object DetectionScanNetV2mAP@0.565.8UDeerMvDETR
3DScanNetV2mAP@0.2578UDeerMvDETR
3DScanNetV2mAP@0.565.8UDeerMvDETR
3D Object DetectionScanNetV2mAP@0.2578UDeerMvDETR
3D Object DetectionScanNetV2mAP@0.565.8UDeerMvDETR
2D ClassificationScanNetV2mAP@0.2578UDeerMvDETR
2D ClassificationScanNetV2mAP@0.565.8UDeerMvDETR
2D Object DetectionScanNetV2mAP@0.2578UDeerMvDETR
2D Object DetectionScanNetV2mAP@0.565.8UDeerMvDETR
16kScanNetV2mAP@0.2578UDeerMvDETR
16kScanNetV2mAP@0.565.8UDeerMvDETR

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07