Binjie Chen, Yunzhou Xia, Yu Zang, Cheng Wang, Jonathan Li
The unstructured nature of point clouds demands that local aggregation be adaptive to different local structures. Previous methods meet this by explicitly embedding spatial relations into each aggregation process. Although this coupled approach has been shown effective in generating clear semantics, aggregation can be greatly slowed down due to repeated relation learning and redundant computation to mix directional and point features. In this work, we propose to decouple the explicit modelling of spatial relations from local aggregation. We theoretically prove that basic neighbor pooling operations can too function without loss of clarity in feature fusion, so long as essential spatial information has been encoded in point features. As an instantiation of decoupled local aggregation, we present DeLA, a lightweight point network, where in each learning stage relative spatial encodings are first formed, and only pointwise convolutions plus edge max-pooling are used for local aggregation then. Further, a regularization term is employed to reduce potential ambiguity through the prediction of relative coordinates. Conceptually simple though, experimental results on five classic benchmarks demonstrate that DeLA achieves state-of-the-art performance with reduced or comparable latency. Specifically, DeLA achieves over 90\% overall accuracy on ScanObjectNN and 74\% mIoU on S3DIS Area 5. Our code is available at https://github.com/Matrix-ASC/DeLA .
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | ScanNet | val mIoU | 75.9 | DeLA |
| Semantic Segmentation | S3DIS Area5 | mAcc | 80 | DeLA |
| Semantic Segmentation | S3DIS Area5 | mIoU | 74.1 | DeLA |
| Semantic Segmentation | S3DIS Area5 | oAcc | 92.2 | DeLA |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | Mean Accuracy | 89.3 | DeLA |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | Overall Accuracy | 90.4 | DeLA |
| Shape Representation Of 3D Point Clouds | ModelNet40 | Mean Accuracy | 92.2 | DeLA |
| Shape Representation Of 3D Point Clouds | ModelNet40 | Overall Accuracy | 94 | DeLA |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | GFLOPs | 1.5 | DeLA |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | Number of params (M) | 5.3 | DeLA |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | Overall Accuracy (PB_T50_RS) | 90.4 | DeLA |
| 3D Point Cloud Classification | ScanObjectNN | Mean Accuracy | 89.3 | DeLA |
| 3D Point Cloud Classification | ScanObjectNN | Overall Accuracy | 90.4 | DeLA |
| 3D Point Cloud Classification | ModelNet40 | Mean Accuracy | 92.2 | DeLA |
| 3D Point Cloud Classification | ModelNet40 | Overall Accuracy | 94 | DeLA |
| 3D Point Cloud Classification | ScanObjectNN | GFLOPs | 1.5 | DeLA |
| 3D Point Cloud Classification | ScanObjectNN | Number of params (M) | 5.3 | DeLA |
| 3D Point Cloud Classification | ScanObjectNN | Overall Accuracy (PB_T50_RS) | 90.4 | DeLA |
| 10-shot image generation | ScanNet | val mIoU | 75.9 | DeLA |
| 10-shot image generation | S3DIS Area5 | mAcc | 80 | DeLA |
| 10-shot image generation | S3DIS Area5 | mIoU | 74.1 | DeLA |
| 10-shot image generation | S3DIS Area5 | oAcc | 92.2 | DeLA |
| 3D Point Cloud Reconstruction | ScanObjectNN | Mean Accuracy | 89.3 | DeLA |
| 3D Point Cloud Reconstruction | ScanObjectNN | Overall Accuracy | 90.4 | DeLA |
| 3D Point Cloud Reconstruction | ModelNet40 | Mean Accuracy | 92.2 | DeLA |
| 3D Point Cloud Reconstruction | ModelNet40 | Overall Accuracy | 94 | DeLA |
| 3D Point Cloud Reconstruction | ScanObjectNN | GFLOPs | 1.5 | DeLA |
| 3D Point Cloud Reconstruction | ScanObjectNN | Number of params (M) | 5.3 | DeLA |
| 3D Point Cloud Reconstruction | ScanObjectNN | Overall Accuracy (PB_T50_RS) | 90.4 | DeLA |