Axel Berg, Magnus Oskarsson, Mark O'Connor
While the Transformer architecture has become ubiquitous in the machine learning field, its adaptation to 3D shape recognition is non-trivial. Due to its quadratic computational complexity, the self-attention operator quickly becomes inefficient as the set of input points grows larger. Furthermore, we find that the attention mechanism struggles to find useful connections between individual points on a global scale. In order to alleviate these problems, we propose a two-stage Point Transformer-in-Transformer (Point-TnT) approach which combines local and global attention mechanisms, enabling both individual points and patches of points to attend to each other effectively. Experiments on shape classification show that such an approach provides more useful features for downstream tasks than the baseline Transformer, while also being more computationally efficient. In addition, we also extend our method to feature matching for scene reconstruction, showing that it can be used in conjunction with existing scene reconstruction pipelines.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Point Cloud Registration | 3DMatch Benchmark | Feature Matching Recall | 96.8 | DIP + Point-TnT |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | Mean Accuracy | 81 | Point-TnT |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | Overall Accuracy | 83.5 | Point-TnT |
| Shape Representation Of 3D Point Clouds | ModelNet40 | Overall Accuracy | 92.6 | Point-TnT |
| 3D Point Cloud Classification | ScanObjectNN | Mean Accuracy | 81 | Point-TnT |
| 3D Point Cloud Classification | ScanObjectNN | Overall Accuracy | 83.5 | Point-TnT |
| 3D Point Cloud Classification | ModelNet40 | Overall Accuracy | 92.6 | Point-TnT |
| 3D Point Cloud Interpolation | 3DMatch Benchmark | Feature Matching Recall | 96.8 | DIP + Point-TnT |
| 3D Point Cloud Reconstruction | ScanObjectNN | Mean Accuracy | 81 | Point-TnT |
| 3D Point Cloud Reconstruction | ScanObjectNN | Overall Accuracy | 83.5 | Point-TnT |
| 3D Point Cloud Reconstruction | ModelNet40 | Overall Accuracy | 92.6 | Point-TnT |