Karim Abou Zeid, Jonas Schult, Alexander Hermans, Bastian Leibe
Recently, the self-supervised learning framework data2vec has shown inspiring performance for various modalities using a masked student-teacher approach. However, it remains open whether such a framework generalizes to the unique challenges of 3D point clouds. To answer this question, we extend data2vec to the point cloud domain and report encouraging results on several downstream tasks. In an in-depth analysis, we discover that the leakage of positional information reveals the overall object shape to the student even under heavy masking and thus hampers data2vec to learn strong representations for point clouds. We address this 3D-specific shortcoming by proposing point2vec, which unleashes the full potential of data2vec-like pre-training on point clouds. Our experiments show that point2vec outperforms other self-supervised methods on shape classification and few-shot learning on ModelNet40 and ScanObjectNN, while achieving competitive results on part segmentation on ShapeNetParts. These results suggest that the learned representations are strong and transferable, highlighting point2vec as a promising direction for self-supervised learning of point cloud representations.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | ShapeNet-Part | Class Average IoU | 84.6 | point2vec |
| Semantic Segmentation | ShapeNet-Part | Instance Average IoU | 86.3 | point2vec |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | Mean Accuracy | 86 | point2vec |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | OBJ-BG (OA) | 91.2 | point2vec |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | OBJ-ONLY (OA) | 90.4 | point2vec |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | Overall Accuracy | 87.5 | point2vec |
| Shape Representation Of 3D Point Clouds | ModelNet40 | Mean Accuracy | 92 | point2vec |
| Shape Representation Of 3D Point Clouds | ModelNet40 | Overall Accuracy | 94.8 | point2vec |
| Shape Representation Of 3D Point Clouds | ModelNet40 10-way (20-shot) | Overall Accuracy | 95.8 | point2vec |
| Shape Representation Of 3D Point Clouds | ModelNet40 10-way (20-shot) | Standard Deviation | 3.1 | point2vec |
| Shape Representation Of 3D Point Clouds | ModelNet40 5-way (10-shot) | Overall Accuracy | 97 | point2vec |
| Shape Representation Of 3D Point Clouds | ModelNet40 5-way (10-shot) | Standard Deviation | 2.8 | point2vec |
| Shape Representation Of 3D Point Clouds | ModelNet40 10-way (10-shot) | Overall Accuracy | 93.9 | point2vec |
| Shape Representation Of 3D Point Clouds | ModelNet40 10-way (10-shot) | Standard Deviation | 4.1 | point2vec |
| Shape Representation Of 3D Point Clouds | ModelNet40 5-way (20-shot) | Overall Accuracy | 98.7 | point2vec |
| Shape Representation Of 3D Point Clouds | ModelNet40 5-way (20-shot) | Standard Deviation | 1.2 | point2vec |
| 3D Point Cloud Classification | ScanObjectNN | Mean Accuracy | 86 | point2vec |
| 3D Point Cloud Classification | ScanObjectNN | OBJ-BG (OA) | 91.2 | point2vec |
| 3D Point Cloud Classification | ScanObjectNN | OBJ-ONLY (OA) | 90.4 | point2vec |
| 3D Point Cloud Classification | ScanObjectNN | Overall Accuracy | 87.5 | point2vec |
| 3D Point Cloud Classification | ModelNet40 | Mean Accuracy | 92 | point2vec |
| 3D Point Cloud Classification | ModelNet40 | Overall Accuracy | 94.8 | point2vec |
| 3D Point Cloud Classification | ModelNet40 10-way (20-shot) | Overall Accuracy | 95.8 | point2vec |
| 3D Point Cloud Classification | ModelNet40 10-way (20-shot) | Standard Deviation | 3.1 | point2vec |
| 3D Point Cloud Classification | ModelNet40 5-way (10-shot) | Overall Accuracy | 97 | point2vec |
| 3D Point Cloud Classification | ModelNet40 5-way (10-shot) | Standard Deviation | 2.8 | point2vec |
| 3D Point Cloud Classification | ModelNet40 10-way (10-shot) | Overall Accuracy | 93.9 | point2vec |
| 3D Point Cloud Classification | ModelNet40 10-way (10-shot) | Standard Deviation | 4.1 | point2vec |
| 3D Point Cloud Classification | ModelNet40 5-way (20-shot) | Overall Accuracy | 98.7 | point2vec |
| 3D Point Cloud Classification | ModelNet40 5-way (20-shot) | Standard Deviation | 1.2 | point2vec |
| 10-shot image generation | ShapeNet-Part | Class Average IoU | 84.6 | point2vec |
| 10-shot image generation | ShapeNet-Part | Instance Average IoU | 86.3 | point2vec |
| 3D Point Cloud Reconstruction | ScanObjectNN | Mean Accuracy | 86 | point2vec |
| 3D Point Cloud Reconstruction | ScanObjectNN | OBJ-BG (OA) | 91.2 | point2vec |
| 3D Point Cloud Reconstruction | ScanObjectNN | OBJ-ONLY (OA) | 90.4 | point2vec |
| 3D Point Cloud Reconstruction | ScanObjectNN | Overall Accuracy | 87.5 | point2vec |
| 3D Point Cloud Reconstruction | ModelNet40 | Mean Accuracy | 92 | point2vec |
| 3D Point Cloud Reconstruction | ModelNet40 | Overall Accuracy | 94.8 | point2vec |
| 3D Point Cloud Reconstruction | ModelNet40 10-way (20-shot) | Overall Accuracy | 95.8 | point2vec |
| 3D Point Cloud Reconstruction | ModelNet40 10-way (20-shot) | Standard Deviation | 3.1 | point2vec |
| 3D Point Cloud Reconstruction | ModelNet40 5-way (10-shot) | Overall Accuracy | 97 | point2vec |
| 3D Point Cloud Reconstruction | ModelNet40 5-way (10-shot) | Standard Deviation | 2.8 | point2vec |
| 3D Point Cloud Reconstruction | ModelNet40 10-way (10-shot) | Overall Accuracy | 93.9 | point2vec |
| 3D Point Cloud Reconstruction | ModelNet40 10-way (10-shot) | Standard Deviation | 4.1 | point2vec |
| 3D Point Cloud Reconstruction | ModelNet40 5-way (20-shot) | Overall Accuracy | 98.7 | point2vec |
| 3D Point Cloud Reconstruction | ModelNet40 5-way (20-shot) | Standard Deviation | 1.2 | point2vec |