TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/V-DETR: DETR with Vertex Relative Position Encoding for 3D...

V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection

Yichao Shen, Zigang Geng, Yuhui Yuan, Yutong Lin, Ze Liu, Chunyu Wang, Han Hu, Nanning Zheng, Baining Guo

2023-08-08object-detection3D Object DetectionObject Detection
PaperPDFCode(official)

Abstract

We introduce a highly performant 3D object detector for point clouds using the DETR framework. The prior attempts all end up with suboptimal results because they fail to learn accurate inductive biases from the limited scale of training data. In particular, the queries often attend to points that are far away from the target objects, violating the locality principle in object detection. To address the limitation, we introduce a novel 3D Vertex Relative Position Encoding (3DV-RPE) method which computes position encoding for each point based on its relative position to the 3D boxes predicted by the queries in each decoder layer, thus providing clear information to guide the model to focus on points near the objects, in accordance with the principle of locality. In addition, we systematically improve the pipeline from various aspects such as data normalization based on our understanding of the task. We show exceptional results on the challenging ScanNetV2 benchmark, achieving significant improvements over the previous 3DETR in $\rm{AP}_{25}$/$\rm{AP}_{50}$ from 65.0\%/47.0\% to 77.8\%/66.0\%, respectively. In addition, our method sets a new record on ScanNetV2 and SUN RGB-D datasets.Code will be released at http://github.com/yichaoshen-MS/V-DETR.

Results

TaskDatasetMetricValueModel
Object DetectionSUN-RGBD valmAP@0.2568V-DETR
Object DetectionSUN-RGBD valmAP@0.551.1V-DETR
Object DetectionScanNetV2mAP@0.2577.8V-DETR
Object DetectionScanNetV2mAP@0.565.9V-DETR
3DSUN-RGBD valmAP@0.2568V-DETR
3DSUN-RGBD valmAP@0.551.1V-DETR
3DScanNetV2mAP@0.2577.8V-DETR
3DScanNetV2mAP@0.565.9V-DETR
3D Object DetectionSUN-RGBD valmAP@0.2568V-DETR
3D Object DetectionSUN-RGBD valmAP@0.551.1V-DETR
3D Object DetectionScanNetV2mAP@0.2577.8V-DETR
3D Object DetectionScanNetV2mAP@0.565.9V-DETR
2D ClassificationSUN-RGBD valmAP@0.2568V-DETR
2D ClassificationSUN-RGBD valmAP@0.551.1V-DETR
2D ClassificationScanNetV2mAP@0.2577.8V-DETR
2D ClassificationScanNetV2mAP@0.565.9V-DETR
2D Object DetectionSUN-RGBD valmAP@0.2568V-DETR
2D Object DetectionSUN-RGBD valmAP@0.551.1V-DETR
2D Object DetectionScanNetV2mAP@0.2577.8V-DETR
2D Object DetectionScanNetV2mAP@0.565.9V-DETR
16kSUN-RGBD valmAP@0.2568V-DETR
16kSUN-RGBD valmAP@0.551.1V-DETR
16kScanNetV2mAP@0.2577.8V-DETR
16kScanNetV2mAP@0.565.9V-DETR

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07