Remco F. Leijenaar, Hamidreza Kasaei
Learning semantically meaningful representations from unstructured 3D point clouds remains a central challenge in computer vision, especially in the absence of large-scale labeled datasets. While masked point modeling (MPM) is widely used in self-supervised 3D learning, its reconstruction-based objective can limit its ability to capture high-level semantics. We propose AsymDSD, an Asymmetric Dual Self-Distillation framework that unifies masked modeling and invariance learning through prediction in the latent space rather than the input space. AsymDSD builds on a joint embedding architecture and introduces several key design choices: an efficient asymmetric setup, disabling attention between masked queries to prevent shape leakage, multi-mask sampling, and a point cloud adaptation of multi-crop. AsymDSD achieves state-of-the-art results on ScanObjectNN (90.53%) and further improves to 93.72% when pretrained on 930k shapes, surpassing prior methods.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Shape Representation Of 3D Point Clouds | ScanObjectNN | OBJ-BG (OA) | 96.73 | AsymDSD-B* (no voting) |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | OBJ-ONLY (OA) | 94.32 | AsymDSD-B* (no voting) |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | Overall Accuracy | 93.72 | AsymDSD-B* (no voting) |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | OBJ-BG (OA) | 94.32 | AsymDSD-S (no voting) |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | OBJ-ONLY (OA) | 91.91 | AsymDSD-S (no voting) |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | Overall Accuracy | 90.53 | AsymDSD-S (no voting) |
| Shape Representation Of 3D Point Clouds | ModelNet40 | Overall Accuracy | 94.7 | AsymDSD-B* (no voting) |
| 3D Point Cloud Classification | ScanObjectNN | OBJ-BG (OA) | 96.73 | AsymDSD-B* (no voting) |
| 3D Point Cloud Classification | ScanObjectNN | OBJ-ONLY (OA) | 94.32 | AsymDSD-B* (no voting) |
| 3D Point Cloud Classification | ScanObjectNN | Overall Accuracy | 93.72 | AsymDSD-B* (no voting) |
| 3D Point Cloud Classification | ScanObjectNN | OBJ-BG (OA) | 94.32 | AsymDSD-S (no voting) |
| 3D Point Cloud Classification | ScanObjectNN | OBJ-ONLY (OA) | 91.91 | AsymDSD-S (no voting) |
| 3D Point Cloud Classification | ScanObjectNN | Overall Accuracy | 90.53 | AsymDSD-S (no voting) |
| 3D Point Cloud Classification | ModelNet40 | Overall Accuracy | 94.7 | AsymDSD-B* (no voting) |
| 3D Point Cloud Reconstruction | ScanObjectNN | OBJ-BG (OA) | 96.73 | AsymDSD-B* (no voting) |
| 3D Point Cloud Reconstruction | ScanObjectNN | OBJ-ONLY (OA) | 94.32 | AsymDSD-B* (no voting) |
| 3D Point Cloud Reconstruction | ScanObjectNN | Overall Accuracy | 93.72 | AsymDSD-B* (no voting) |
| 3D Point Cloud Reconstruction | ScanObjectNN | OBJ-BG (OA) | 94.32 | AsymDSD-S (no voting) |
| 3D Point Cloud Reconstruction | ScanObjectNN | OBJ-ONLY (OA) | 91.91 | AsymDSD-S (no voting) |
| 3D Point Cloud Reconstruction | ScanObjectNN | Overall Accuracy | 90.53 | AsymDSD-S (no voting) |
| 3D Point Cloud Reconstruction | ModelNet40 | Overall Accuracy | 94.7 | AsymDSD-B* (no voting) |