Zhuoxu Huang, Zhiyou Zhao, Banghuai Li, Jungong Han
Transformer with its underlying attention mechanism and the ability to capture long-range dependencies makes it become a natural choice for unordered point cloud data. However, separated local regions from the general sampling architecture corrupt the structural information of the instances, and the inherent relationships between adjacent local regions lack exploration, while local structural information is crucial in a transformer-based 3D point cloud model. Therefore, in this paper, we propose a novel module named Local Context Propagation (LCP) to exploit the message passing between neighboring local regions and make their representations more informative and discriminative. More specifically, we use the overlap points of adjacent local regions (which statistically show to be prevalent) as intermediaries, then re-weight the features of these shared points from different local regions before passing them to the next layers. Inserting the LCP module between two transformer layers results in a significant improvement in network expressiveness. Finally, we design a flexible LCPFormer architecture equipped with the LCP module. The proposed method is applicable to different tasks and outperforms various transformer-based methods in benchmarks including 3D shape classification and dense prediction tasks such as 3D object detection and semantic segmentation. Code will be released for reproduction.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | S3DIS Area5 | mAcc | 76.8 | LCPFormer |
| Semantic Segmentation | S3DIS Area5 | mIoU | 70.2 | LCPFormer |
| Semantic Segmentation | S3DIS Area5 | oAcc | 90.8 | LCPFormer |
| Semantic Segmentation | SensatUrban | mIoU | 63.4 | LCPFormer |
| Object Detection | SUN-RGBD val | mAP@0.25 | 63.2 | LCPFormer |
| Object Detection | SUN-RGBD val | mAP@0.5 | 46.2 | LCPFormer |
| 3D | SUN-RGBD val | mAP@0.25 | 63.2 | LCPFormer |
| 3D | SUN-RGBD val | mAP@0.5 | 46.2 | LCPFormer |
| Shape Representation Of 3D Point Clouds | ModelNet40 | Mean Accuracy | 90.7 | LCPFormer |
| Shape Representation Of 3D Point Clouds | ModelNet40 | Overall Accuracy | 93.6 | LCPFormer |
| 3D Semantic Segmentation | SensatUrban | mIoU | 63.4 | LCPFormer |
| 3D Object Detection | SUN-RGBD val | mAP@0.25 | 63.2 | LCPFormer |
| 3D Object Detection | SUN-RGBD val | mAP@0.5 | 46.2 | LCPFormer |
| 3D Point Cloud Classification | ModelNet40 | Mean Accuracy | 90.7 | LCPFormer |
| 3D Point Cloud Classification | ModelNet40 | Overall Accuracy | 93.6 | LCPFormer |
| 2D Classification | SUN-RGBD val | mAP@0.25 | 63.2 | LCPFormer |
| 2D Classification | SUN-RGBD val | mAP@0.5 | 46.2 | LCPFormer |
| 2D Object Detection | SUN-RGBD val | mAP@0.25 | 63.2 | LCPFormer |
| 2D Object Detection | SUN-RGBD val | mAP@0.5 | 46.2 | LCPFormer |
| 10-shot image generation | S3DIS Area5 | mAcc | 76.8 | LCPFormer |
| 10-shot image generation | S3DIS Area5 | mIoU | 70.2 | LCPFormer |
| 10-shot image generation | S3DIS Area5 | oAcc | 90.8 | LCPFormer |
| 10-shot image generation | SensatUrban | mIoU | 63.4 | LCPFormer |
| 3D Point Cloud Reconstruction | ModelNet40 | Mean Accuracy | 90.7 | LCPFormer |
| 3D Point Cloud Reconstruction | ModelNet40 | Overall Accuracy | 93.6 | LCPFormer |
| 16k | SUN-RGBD val | mAP@0.25 | 63.2 | LCPFormer |
| 16k | SUN-RGBD val | mAP@0.5 | 46.2 | LCPFormer |