TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/An End-to-End Transformer Model for 3D Object Detection

An End-to-End Transformer Model for 3D Object Detection

Ishan Misra, Rohit Girdhar, Armand Joulin

2021-09-16ICCV 2021 10object-detection3D Object DetectionObject Detection
PaperPDFCode

Abstract

We propose 3DETR, an end-to-end Transformer based object detection model for 3D point clouds. Compared to existing detection methods that employ a number of 3D-specific inductive biases, 3DETR requires minimal modifications to the vanilla Transformer block. Specifically, we find that a standard Transformer with non-parametric queries and Fourier positional embeddings is competitive with specialized architectures that employ libraries of 3D-specific operators with hand-tuned hyperparameters. Nevertheless, 3DETR is conceptually simple and easy to implement, enabling further improvements by incorporating 3D domain knowledge. Through extensive experiments, we show 3DETR outperforms the well-established and highly optimized VoteNet baselines on the challenging ScanNetV2 dataset by 9.5%. Furthermore, we show 3DETR is applicable to 3D tasks beyond detection, and can serve as a building block for future research.

Results

TaskDatasetMetricValueModel
Object DetectionSUN-RGBD valmAP@0.2559.13DETR-m
Object DetectionSUN-RGBD valmAP@0.532.73DETR-m
Object DetectionScanNetV2mAP@0.25653DETR-m
Object DetectionScanNetV2mAP@0.5473DETR-m
3DSUN-RGBD valmAP@0.2559.13DETR-m
3DSUN-RGBD valmAP@0.532.73DETR-m
3DScanNetV2mAP@0.25653DETR-m
3DScanNetV2mAP@0.5473DETR-m
3D Object DetectionSUN-RGBD valmAP@0.2559.13DETR-m
3D Object DetectionSUN-RGBD valmAP@0.532.73DETR-m
3D Object DetectionScanNetV2mAP@0.25653DETR-m
3D Object DetectionScanNetV2mAP@0.5473DETR-m
2D ClassificationSUN-RGBD valmAP@0.2559.13DETR-m
2D ClassificationSUN-RGBD valmAP@0.532.73DETR-m
2D ClassificationScanNetV2mAP@0.25653DETR-m
2D ClassificationScanNetV2mAP@0.5473DETR-m
2D Object DetectionSUN-RGBD valmAP@0.2559.13DETR-m
2D Object DetectionSUN-RGBD valmAP@0.532.73DETR-m
2D Object DetectionScanNetV2mAP@0.25653DETR-m
2D Object DetectionScanNetV2mAP@0.5473DETR-m
16kSUN-RGBD valmAP@0.2559.13DETR-m
16kSUN-RGBD valmAP@0.532.73DETR-m
16kScanNetV2mAP@0.25653DETR-m
16kScanNetV2mAP@0.5473DETR-m

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07