TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Self-Supervised Learning of Object Parts for Semantic Segm...

Self-Supervised Learning of Object Parts for Semantic Segmentation

Adrian Ziegler, Yuki M. Asano

2022-04-27CVPR 2022 1Unsupervised Image SegmentationRepresentation LearningSelf-Supervised LearningUnsupervised Semantic SegmentationCommunity DetectionSegmentationSemantic SegmentationImage Segmentation
PaperPDFCode(official)

Abstract

Progress in self-supervised learning has brought strong general image representation learning methods. Yet so far, it has mostly focused on image-level learning. In turn, tasks such as unsupervised image segmentation have not benefited from this trend as they require spatially-diverse representations. However, learning dense representations is challenging, as in the unsupervised context it is not clear how to guide the model to learn representations that correspond to various potential object categories. In this paper, we argue that self-supervised learning of object parts is a solution to this issue. Object parts are generalizable: they are a priori independent of an object definition, but can be grouped to form objects a posteriori. To this end, we leverage the recently proposed Vision Transformer's capability of attending to objects and combine it with a spatially dense clustering task for fine-tuning the spatial tokens. Our method surpasses the state-of-the-art on three semantic segmentation benchmarks by 17%-3%, showing that our representations are versatile under various object definitions. Finally, we extend this to fully unsupervised segmentation - which refrains completely from using label information even at test-time - and demonstrate that a simple method for automatically merging discovered object parts based on community detection yields substantial gains.

Results

TaskDatasetMetricValueModel
Semantic SegmentationPASCAL VOC 2012 valClustering [mIoU]47.2Leopart (ViT-B/8)
Semantic SegmentationPASCAL VOC 2012 valFCN [mIoU]76.3Leopart (ViT-B/8)
Semantic SegmentationPASCAL VOC 2012 valClustering [mIoU]41.7Leopart (ViT-S/16)
Semantic SegmentationPASCAL VOC 2012 valFCN [mIoU]71.4Leopart (ViT-S/16)
Semantic SegmentationPASCAL VOC 2012 valLinear Classifier [mIoU]69.3Leopart (ViT-S/16)
Unsupervised Semantic SegmentationPASCAL VOC 2012 valClustering [mIoU]47.2Leopart (ViT-B/8)
Unsupervised Semantic SegmentationPASCAL VOC 2012 valFCN [mIoU]76.3Leopart (ViT-B/8)
Unsupervised Semantic SegmentationPASCAL VOC 2012 valClustering [mIoU]41.7Leopart (ViT-S/16)
Unsupervised Semantic SegmentationPASCAL VOC 2012 valFCN [mIoU]71.4Leopart (ViT-S/16)
Unsupervised Semantic SegmentationPASCAL VOC 2012 valLinear Classifier [mIoU]69.3Leopart (ViT-S/16)
10-shot image generationPASCAL VOC 2012 valClustering [mIoU]47.2Leopart (ViT-B/8)
10-shot image generationPASCAL VOC 2012 valFCN [mIoU]76.3Leopart (ViT-B/8)
10-shot image generationPASCAL VOC 2012 valClustering [mIoU]41.7Leopart (ViT-S/16)
10-shot image generationPASCAL VOC 2012 valFCN [mIoU]71.4Leopart (ViT-S/16)
10-shot image generationPASCAL VOC 2012 valLinear Classifier [mIoU]69.3Leopart (ViT-S/16)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17