TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Point2Vec for Self-Supervised Representation Learning on P...

Point2Vec for Self-Supervised Representation Learning on Point Clouds

Karim Abou Zeid, Jonas Schult, Alexander Hermans, Bastian Leibe

2023-03-29Few-Shot LearningRepresentation LearningSelf-Supervised LearningFew-Shot 3D Point Cloud Classification3D Part Segmentation3D Point Cloud Classification
PaperPDFCode(official)

Abstract

Recently, the self-supervised learning framework data2vec has shown inspiring performance for various modalities using a masked student-teacher approach. However, it remains open whether such a framework generalizes to the unique challenges of 3D point clouds. To answer this question, we extend data2vec to the point cloud domain and report encouraging results on several downstream tasks. In an in-depth analysis, we discover that the leakage of positional information reveals the overall object shape to the student even under heavy masking and thus hampers data2vec to learn strong representations for point clouds. We address this 3D-specific shortcoming by proposing point2vec, which unleashes the full potential of data2vec-like pre-training on point clouds. Our experiments show that point2vec outperforms other self-supervised methods on shape classification and few-shot learning on ModelNet40 and ScanObjectNN, while achieving competitive results on part segmentation on ShapeNetParts. These results suggest that the learned representations are strong and transferable, highlighting point2vec as a promising direction for self-supervised learning of point cloud representations.

Results

TaskDatasetMetricValueModel
Semantic SegmentationShapeNet-PartClass Average IoU84.6point2vec
Semantic SegmentationShapeNet-PartInstance Average IoU86.3point2vec
Shape Representation Of 3D Point CloudsScanObjectNNMean Accuracy86point2vec
Shape Representation Of 3D Point CloudsScanObjectNNOBJ-BG (OA)91.2point2vec
Shape Representation Of 3D Point CloudsScanObjectNNOBJ-ONLY (OA)90.4point2vec
Shape Representation Of 3D Point CloudsScanObjectNNOverall Accuracy87.5point2vec
Shape Representation Of 3D Point CloudsModelNet40Mean Accuracy92point2vec
Shape Representation Of 3D Point CloudsModelNet40Overall Accuracy94.8point2vec
Shape Representation Of 3D Point CloudsModelNet40 10-way (20-shot)Overall Accuracy95.8point2vec
Shape Representation Of 3D Point CloudsModelNet40 10-way (20-shot)Standard Deviation3.1point2vec
Shape Representation Of 3D Point CloudsModelNet40 5-way (10-shot)Overall Accuracy97point2vec
Shape Representation Of 3D Point CloudsModelNet40 5-way (10-shot)Standard Deviation2.8point2vec
Shape Representation Of 3D Point CloudsModelNet40 10-way (10-shot)Overall Accuracy93.9point2vec
Shape Representation Of 3D Point CloudsModelNet40 10-way (10-shot)Standard Deviation4.1point2vec
Shape Representation Of 3D Point CloudsModelNet40 5-way (20-shot)Overall Accuracy98.7point2vec
Shape Representation Of 3D Point CloudsModelNet40 5-way (20-shot)Standard Deviation1.2point2vec
3D Point Cloud ClassificationScanObjectNNMean Accuracy86point2vec
3D Point Cloud ClassificationScanObjectNNOBJ-BG (OA)91.2point2vec
3D Point Cloud ClassificationScanObjectNNOBJ-ONLY (OA)90.4point2vec
3D Point Cloud ClassificationScanObjectNNOverall Accuracy87.5point2vec
3D Point Cloud ClassificationModelNet40Mean Accuracy92point2vec
3D Point Cloud ClassificationModelNet40Overall Accuracy94.8point2vec
3D Point Cloud ClassificationModelNet40 10-way (20-shot)Overall Accuracy95.8point2vec
3D Point Cloud ClassificationModelNet40 10-way (20-shot)Standard Deviation3.1point2vec
3D Point Cloud ClassificationModelNet40 5-way (10-shot)Overall Accuracy97point2vec
3D Point Cloud ClassificationModelNet40 5-way (10-shot)Standard Deviation2.8point2vec
3D Point Cloud ClassificationModelNet40 10-way (10-shot)Overall Accuracy93.9point2vec
3D Point Cloud ClassificationModelNet40 10-way (10-shot)Standard Deviation4.1point2vec
3D Point Cloud ClassificationModelNet40 5-way (20-shot)Overall Accuracy98.7point2vec
3D Point Cloud ClassificationModelNet40 5-way (20-shot)Standard Deviation1.2point2vec
10-shot image generationShapeNet-PartClass Average IoU84.6point2vec
10-shot image generationShapeNet-PartInstance Average IoU86.3point2vec
3D Point Cloud ReconstructionScanObjectNNMean Accuracy86point2vec
3D Point Cloud ReconstructionScanObjectNNOBJ-BG (OA)91.2point2vec
3D Point Cloud ReconstructionScanObjectNNOBJ-ONLY (OA)90.4point2vec
3D Point Cloud ReconstructionScanObjectNNOverall Accuracy87.5point2vec
3D Point Cloud ReconstructionModelNet40Mean Accuracy92point2vec
3D Point Cloud ReconstructionModelNet40Overall Accuracy94.8point2vec
3D Point Cloud ReconstructionModelNet40 10-way (20-shot)Overall Accuracy95.8point2vec
3D Point Cloud ReconstructionModelNet40 10-way (20-shot)Standard Deviation3.1point2vec
3D Point Cloud ReconstructionModelNet40 5-way (10-shot)Overall Accuracy97point2vec
3D Point Cloud ReconstructionModelNet40 5-way (10-shot)Standard Deviation2.8point2vec
3D Point Cloud ReconstructionModelNet40 10-way (10-shot)Overall Accuracy93.9point2vec
3D Point Cloud ReconstructionModelNet40 10-way (10-shot)Standard Deviation4.1point2vec
3D Point Cloud ReconstructionModelNet40 5-way (20-shot)Overall Accuracy98.7point2vec
3D Point Cloud ReconstructionModelNet40 5-way (20-shot)Standard Deviation1.2point2vec

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20GLAD: Generalizable Tuning for Vision-Language Models2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16