Ishan Misra, Rohit Girdhar, Armand Joulin
We propose 3DETR, an end-to-end Transformer based object detection model for 3D point clouds. Compared to existing detection methods that employ a number of 3D-specific inductive biases, 3DETR requires minimal modifications to the vanilla Transformer block. Specifically, we find that a standard Transformer with non-parametric queries and Fourier positional embeddings is competitive with specialized architectures that employ libraries of 3D-specific operators with hand-tuned hyperparameters. Nevertheless, 3DETR is conceptually simple and easy to implement, enabling further improvements by incorporating 3D domain knowledge. Through extensive experiments, we show 3DETR outperforms the well-established and highly optimized VoteNet baselines on the challenging ScanNetV2 dataset by 9.5%. Furthermore, we show 3DETR is applicable to 3D tasks beyond detection, and can serve as a building block for future research.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | SUN-RGBD val | mAP@0.25 | 59.1 | 3DETR-m |
| Object Detection | SUN-RGBD val | mAP@0.5 | 32.7 | 3DETR-m |
| Object Detection | ScanNetV2 | mAP@0.25 | 65 | 3DETR-m |
| Object Detection | ScanNetV2 | mAP@0.5 | 47 | 3DETR-m |
| 3D | SUN-RGBD val | mAP@0.25 | 59.1 | 3DETR-m |
| 3D | SUN-RGBD val | mAP@0.5 | 32.7 | 3DETR-m |
| 3D | ScanNetV2 | mAP@0.25 | 65 | 3DETR-m |
| 3D | ScanNetV2 | mAP@0.5 | 47 | 3DETR-m |
| 3D Object Detection | SUN-RGBD val | mAP@0.25 | 59.1 | 3DETR-m |
| 3D Object Detection | SUN-RGBD val | mAP@0.5 | 32.7 | 3DETR-m |
| 3D Object Detection | ScanNetV2 | mAP@0.25 | 65 | 3DETR-m |
| 3D Object Detection | ScanNetV2 | mAP@0.5 | 47 | 3DETR-m |
| 2D Classification | SUN-RGBD val | mAP@0.25 | 59.1 | 3DETR-m |
| 2D Classification | SUN-RGBD val | mAP@0.5 | 32.7 | 3DETR-m |
| 2D Classification | ScanNetV2 | mAP@0.25 | 65 | 3DETR-m |
| 2D Classification | ScanNetV2 | mAP@0.5 | 47 | 3DETR-m |
| 2D Object Detection | SUN-RGBD val | mAP@0.25 | 59.1 | 3DETR-m |
| 2D Object Detection | SUN-RGBD val | mAP@0.5 | 32.7 | 3DETR-m |
| 2D Object Detection | ScanNetV2 | mAP@0.25 | 65 | 3DETR-m |
| 2D Object Detection | ScanNetV2 | mAP@0.5 | 47 | 3DETR-m |
| 16k | SUN-RGBD val | mAP@0.25 | 59.1 | 3DETR-m |
| 16k | SUN-RGBD val | mAP@0.5 | 32.7 | 3DETR-m |
| 16k | ScanNetV2 | mAP@0.25 | 65 | 3DETR-m |
| 16k | ScanNetV2 | mAP@0.5 | 47 | 3DETR-m |