Haiyang Wang, Lihe Ding, Shaocong Dong, Shaoshuai Shi, Aoxue Li, Jianan Li, Zhenguo Li, LiWei Wang
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D. Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels with the same semantic predictions, which considers semantic consistency and diverse locality abandoned in previous bottom-up approaches. Then, to recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module to directly aggregate fine-grained spatial information from backbone for further proposal refinement. It is memory-and-computation efficient and can better encode the geometry-specific features of each 3D proposal. Our model achieves state-of-the-art 3D detection performance with remarkable gains of +\textit{3.6\%} on ScanNet V2 and +\textit{2.6}\% on SUN RGB-D in term of mAP@0.25. Code will be available at https://github.com/Haiyang-W/CAGroup3D.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | SUN-RGBD val | mAP@0.25 | 66.8 | CAGroup3D(Geo only) |
| Object Detection | SUN-RGBD val | mAP@0.5 | 50.2 | CAGroup3D(Geo only) |
| Object Detection | SUN-RGBD | mAP@0.25 | 66.8 | CAGroup3D (Geo Only) |
| Object Detection | SUN-RGBD | mAP@0.5 | 50.2 | CAGroup3D (Geo Only) |
| Object Detection | ScanNetV2 | mAP@0.25 | 75.1 | CAGroup3D |
| Object Detection | ScanNetV2 | mAP@0.5 | 61.3 | CAGroup3D |
| 3D | SUN-RGBD val | mAP@0.25 | 66.8 | CAGroup3D(Geo only) |
| 3D | SUN-RGBD val | mAP@0.5 | 50.2 | CAGroup3D(Geo only) |
| 3D | SUN-RGBD | mAP@0.25 | 66.8 | CAGroup3D (Geo Only) |
| 3D | SUN-RGBD | mAP@0.5 | 50.2 | CAGroup3D (Geo Only) |
| 3D | ScanNetV2 | mAP@0.25 | 75.1 | CAGroup3D |
| 3D | ScanNetV2 | mAP@0.5 | 61.3 | CAGroup3D |
| 3D Object Detection | SUN-RGBD val | mAP@0.25 | 66.8 | CAGroup3D(Geo only) |
| 3D Object Detection | SUN-RGBD val | mAP@0.5 | 50.2 | CAGroup3D(Geo only) |
| 3D Object Detection | SUN-RGBD | mAP@0.25 | 66.8 | CAGroup3D (Geo Only) |
| 3D Object Detection | SUN-RGBD | mAP@0.5 | 50.2 | CAGroup3D (Geo Only) |
| 3D Object Detection | ScanNetV2 | mAP@0.25 | 75.1 | CAGroup3D |
| 3D Object Detection | ScanNetV2 | mAP@0.5 | 61.3 | CAGroup3D |
| 2D Classification | SUN-RGBD val | mAP@0.25 | 66.8 | CAGroup3D(Geo only) |
| 2D Classification | SUN-RGBD val | mAP@0.5 | 50.2 | CAGroup3D(Geo only) |
| 2D Classification | SUN-RGBD | mAP@0.25 | 66.8 | CAGroup3D (Geo Only) |
| 2D Classification | SUN-RGBD | mAP@0.5 | 50.2 | CAGroup3D (Geo Only) |
| 2D Classification | ScanNetV2 | mAP@0.25 | 75.1 | CAGroup3D |
| 2D Classification | ScanNetV2 | mAP@0.5 | 61.3 | CAGroup3D |
| 2D Object Detection | SUN-RGBD val | mAP@0.25 | 66.8 | CAGroup3D(Geo only) |
| 2D Object Detection | SUN-RGBD val | mAP@0.5 | 50.2 | CAGroup3D(Geo only) |
| 2D Object Detection | SUN-RGBD | mAP@0.25 | 66.8 | CAGroup3D (Geo Only) |
| 2D Object Detection | SUN-RGBD | mAP@0.5 | 50.2 | CAGroup3D (Geo Only) |
| 2D Object Detection | ScanNetV2 | mAP@0.25 | 75.1 | CAGroup3D |
| 2D Object Detection | ScanNetV2 | mAP@0.5 | 61.3 | CAGroup3D |
| 16k | SUN-RGBD val | mAP@0.25 | 66.8 | CAGroup3D(Geo only) |
| 16k | SUN-RGBD val | mAP@0.5 | 50.2 | CAGroup3D(Geo only) |
| 16k | SUN-RGBD | mAP@0.25 | 66.8 | CAGroup3D (Geo Only) |
| 16k | SUN-RGBD | mAP@0.5 | 50.2 | CAGroup3D (Geo Only) |
| 16k | ScanNetV2 | mAP@0.25 | 75.1 | CAGroup3D |
| 16k | ScanNetV2 | mAP@0.5 | 61.3 | CAGroup3D |