Zihao Li, Pan Gao, Hui Yuan, Ran Wei, Manoranjan Paul
Discovering inter-point connection for efficient high-dimensional feature extraction from point coordinate is a key challenge in processing point cloud. Most existing methods focus on designing efficient local feature extractors while ignoring global connection, or vice versa. In this paper, we design a new Inductive Bias-aided Transformer (IBT) method to learn 3D inter-point relations, which considers both local and global attentions. Specifically, considering local spatial coherence, local feature learning is performed through Relative Position Encoding and Attentive Feature Pooling. We incorporate the learned locality into the Transformer module. The local feature affects value component in Transformer to modulate the relationship between channels of each point, which can enhance self-attention mechanism with locality based channel interaction. We demonstrate its superiority experimentally on classification and segmentation tasks. The code is available at: https://github.com/jiamang/IBT
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | ShapeNet-Part | Instance Average IoU | 86.2 | Ours |
| 3D | ModelNet40 | Classification Accuracy | 93.6 | Ours |
| Shape Representation Of 3D Point Clouds | ModelNet40 | Classification Accuracy | 93.6 | Ours |
| 3D Object Classification | ModelNet40 | Classification Accuracy | 93.6 | Ours |
| 3D Point Cloud Classification | ModelNet40 | Classification Accuracy | 93.6 | Ours |
| Point Cloud Classification | ISPRS | Average F1 | 82.8 | Ours |
| 3D Classification | ModelNet40 | Classification Accuracy | 93.6 | Ours |
| 10-shot image generation | ShapeNet-Part | Instance Average IoU | 86.2 | Ours |
| 3D Point Cloud Reconstruction | ModelNet40 | Classification Accuracy | 93.6 | Ours |