Yatian Pang, Wenxiao Wang, Francis E. H. Tay, Wei Liu, Yonghong Tian, Li Yuan
As a promising scheme of self-supervised learning, masked autoencoding has significantly advanced natural language processing and computer vision. Inspired by this, we propose a neat scheme of masked autoencoders for point cloud self-supervised learning, addressing the challenges posed by point cloud's properties, including leakage of location information and uneven information density. Concretely, we divide the input point cloud into irregular point patches and randomly mask them at a high ratio. Then, a standard Transformer based autoencoder, with an asymmetric design and a shifting mask tokens operation, learns high-level latent features from unmasked point patches, aiming to reconstruct the masked point patches. Extensive experiments show that our approach is efficient during pre-training and generalizes well on various downstream tasks. Specifically, our pre-trained models achieve 85.18% accuracy on ScanObjectNN and 94.04% accuracy on ModelNet40, outperforming all the other self-supervised learning methods. We show with our scheme, a simple architecture entirely based on standard Transformers can surpass dedicated Transformer models from supervised learning. Our approach also advances state-of-the-art accuracies by 1.5%-2.3% in the few-shot object classification. Furthermore, our work inspires the feasibility of applying unified architectures from languages and images to the point cloud.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Shape Representation Of 3D Point Clouds | ScanObjectNN | OBJ-BG (OA) | 90.02 | Point-MAE |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | OBJ-ONLY (OA) | 88.29 | Point-MAE |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | Overall Accuracy | 85.2 | Point-MAE |
| Shape Representation Of 3D Point Clouds | ModelNet40 | Overall Accuracy | 94 | Point-MAE |
| Shape Representation Of 3D Point Clouds | ModelNet40 10-way (20-shot) | Overall Accuracy | 95 | Point-MAE |
| Shape Representation Of 3D Point Clouds | ModelNet40 10-way (20-shot) | Standard Deviation | 3 | Point-MAE |
| Shape Representation Of 3D Point Clouds | ModelNet40 5-way (10-shot) | Overall Accuracy | 96.3 | Point-MAE |
| Shape Representation Of 3D Point Clouds | ModelNet40 5-way (10-shot) | Standard Deviation | 2.5 | Point-MAE |
| Shape Representation Of 3D Point Clouds | ModelNet40 10-way (10-shot) | Overall Accuracy | 92.6 | Point-MAE |
| Shape Representation Of 3D Point Clouds | ModelNet40 10-way (10-shot) | Standard Deviation | 4.1 | Point-MAE |
| Shape Representation Of 3D Point Clouds | ModelNet40 5-way (20-shot) | Overall Accuracy | 97.8 | Point-MAE |
| Shape Representation Of 3D Point Clouds | ModelNet40 5-way (20-shot) | Standard Deviation | 1.8 | Point-MAE |
| 3D Point Cloud Classification | ScanObjectNN | OBJ-BG (OA) | 90.02 | Point-MAE |
| 3D Point Cloud Classification | ScanObjectNN | OBJ-ONLY (OA) | 88.29 | Point-MAE |
| 3D Point Cloud Classification | ScanObjectNN | Overall Accuracy | 85.2 | Point-MAE |
| 3D Point Cloud Classification | ModelNet40 | Overall Accuracy | 94 | Point-MAE |
| 3D Point Cloud Classification | ModelNet40 10-way (20-shot) | Overall Accuracy | 95 | Point-MAE |
| 3D Point Cloud Classification | ModelNet40 10-way (20-shot) | Standard Deviation | 3 | Point-MAE |
| 3D Point Cloud Classification | ModelNet40 5-way (10-shot) | Overall Accuracy | 96.3 | Point-MAE |
| 3D Point Cloud Classification | ModelNet40 5-way (10-shot) | Standard Deviation | 2.5 | Point-MAE |
| 3D Point Cloud Classification | ModelNet40 10-way (10-shot) | Overall Accuracy | 92.6 | Point-MAE |
| 3D Point Cloud Classification | ModelNet40 10-way (10-shot) | Standard Deviation | 4.1 | Point-MAE |
| 3D Point Cloud Classification | ModelNet40 5-way (20-shot) | Overall Accuracy | 97.8 | Point-MAE |
| 3D Point Cloud Classification | ModelNet40 5-way (20-shot) | Standard Deviation | 1.8 | Point-MAE |
| Point Cloud Segmentation | PointCloud-C | mean Corruption Error (mCE) | 0.927 | PointMAE |
| 3D Point Cloud Reconstruction | ScanObjectNN | OBJ-BG (OA) | 90.02 | Point-MAE |
| 3D Point Cloud Reconstruction | ScanObjectNN | OBJ-ONLY (OA) | 88.29 | Point-MAE |
| 3D Point Cloud Reconstruction | ScanObjectNN | Overall Accuracy | 85.2 | Point-MAE |
| 3D Point Cloud Reconstruction | ModelNet40 | Overall Accuracy | 94 | Point-MAE |
| 3D Point Cloud Reconstruction | ModelNet40 10-way (20-shot) | Overall Accuracy | 95 | Point-MAE |
| 3D Point Cloud Reconstruction | ModelNet40 10-way (20-shot) | Standard Deviation | 3 | Point-MAE |
| 3D Point Cloud Reconstruction | ModelNet40 5-way (10-shot) | Overall Accuracy | 96.3 | Point-MAE |
| 3D Point Cloud Reconstruction | ModelNet40 5-way (10-shot) | Standard Deviation | 2.5 | Point-MAE |
| 3D Point Cloud Reconstruction | ModelNet40 10-way (10-shot) | Overall Accuracy | 92.6 | Point-MAE |
| 3D Point Cloud Reconstruction | ModelNet40 10-way (10-shot) | Standard Deviation | 4.1 | Point-MAE |
| 3D Point Cloud Reconstruction | ModelNet40 5-way (20-shot) | Overall Accuracy | 97.8 | Point-MAE |
| 3D Point Cloud Reconstruction | ModelNet40 5-way (20-shot) | Standard Deviation | 1.8 | Point-MAE |