TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Sonata: Self-Supervised Learning of Reliable Point Represe...

Sonata: Self-Supervised Learning of Reliable Point Representations

Xiaoyang Wu, Daniel DeTone, Duncan Frost, Tianwei Shen, Chris Xie, Nan Yang, Jakob Engel, Richard Newcombe, Hengshuang Zhao, Julian Straub

2025-03-20CVPR 2025 1Spatial ReasoningSelf-Supervised LearningSemantic Segmentation3D Semantic Segmentation
PaperPDFCode(official)Code(official)

Abstract

In this paper, we question whether we have a reliable self-supervised point cloud model that can be used for diverse 3D tasks via simple linear probing, even with limited data and minimal computation. We find that existing 3D self-supervised learning approaches fall short when evaluated on representation quality through linear probing. We hypothesize that this is due to what we term the "geometric shortcut", which causes representations to collapse to low-level spatial features. This challenge is unique to 3D and arises from the sparse nature of point cloud data. We address it through two key strategies: obscuring spatial information and enhancing the reliance on input features, ultimately composing a Sonata of 140k point clouds through self-distillation. Sonata is simple and intuitive, yet its learned representations are strong and reliable: zero-shot visualizations demonstrate semantic grouping, alongside strong spatial reasoning through nearest-neighbor relationships. Sonata demonstrates exceptional parameter and data efficiency, tripling linear probing accuracy (from 21.8% to 72.5%) on ScanNet and nearly doubling performance with only 1% of the data compared to previous approaches. Full fine-tuning further advances SOTA across both 3D indoor and outdoor perception tasks.

Results

TaskDatasetMetricValueModel
Semantic SegmentationScanNetval mIoU79.4Sonata + PTv3
Semantic SegmentationS3DIS Area5mAcc81.6Sonata + PTv3
Semantic SegmentationS3DIS Area5mIoU76Sonata + PTv3
Semantic SegmentationS3DIS Area5oAcc93Sonata + PTv3
Semantic SegmentationS3DISMean IoU82.3Sonata + PTv3
Semantic SegmentationS3DISmAcc89.9Sonata + PTv3
Semantic SegmentationS3DISoAcc93.3Sonata + PTv3
Semantic SegmentationScanNet200val mIoU36.8Sonata + PTv3
Semantic SegmentationScanNet++Top-1 IoU0.495Sonata
Semantic SegmentationScanNet++Top-3 IoU0.735Sonata
3D Semantic SegmentationScanNet200val mIoU36.8Sonata + PTv3
3D Semantic SegmentationScanNet++Top-1 IoU0.495Sonata
3D Semantic SegmentationScanNet++Top-3 IoU0.735Sonata
10-shot image generationScanNetval mIoU79.4Sonata + PTv3
10-shot image generationS3DIS Area5mAcc81.6Sonata + PTv3
10-shot image generationS3DIS Area5mIoU76Sonata + PTv3
10-shot image generationS3DIS Area5oAcc93Sonata + PTv3
10-shot image generationS3DISMean IoU82.3Sonata + PTv3
10-shot image generationS3DISmAcc89.9Sonata + PTv3
10-shot image generationS3DISoAcc93.3Sonata + PTv3
10-shot image generationScanNet200val mIoU36.8Sonata + PTv3
10-shot image generationScanNet++Top-1 IoU0.495Sonata
10-shot image generationScanNet++Top-3 IoU0.735Sonata

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17MindJourney: Test-Time Scaling with World Models for Spatial Reasoning2025-07-16SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16