TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Unsupervised Semantic Segmentation Through Depth-Guided Fe...

Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling

Leon Sick, Dominik Engel, Pedro Hermosilla, Timo Ropinski

2023-09-21CVPR 2024 1Unsupervised Semantic SegmentationSemantic SegmentationUnsupervised Panoptic Segmentation
PaperPDFCode

Abstract

Traditionally, training neural networks to perform semantic segmentation required expensive human-made annotations. But more recently, advances in the field of unsupervised learning have made significant progress on this issue and towards closing the gap to supervised algorithms. To achieve this, semantic knowledge is distilled by learning to correlate randomly sampled features from images across an entire dataset. In this work, we build upon these advances by incorporating information about the structure of the scene into the training process through the use of depth information. We achieve this by (1) learning depth-feature correlation by spatially correlate the feature maps with the depth maps to induce knowledge about the structure of the scene and (2) implementing farthest-point sampling to more effectively select relevant features by utilizing 3D sampling techniques on depth information of the scene. Finally, we demonstrate the effectiveness of our technical contributions through extensive experimentation and present significant improvements in performance across multiple benchmark datasets.

Results

TaskDatasetMetricValueModel
Semantic SegmentationCOCO-Stuff-27Clustering [Accuracy]58.6DepthG (ViT-B/8)
Semantic SegmentationCOCO-Stuff-27Clustering [mIoU]29DepthG (ViT-B/8)
Semantic SegmentationCOCO-Stuff-27Linear Classifier [Accuracy]75.5DepthG (ViT-B/8)
Semantic SegmentationCOCO-Stuff-27Linear Classifier [mIoU]41.6DepthG (ViT-B/8)
Semantic SegmentationCOCO-Stuff-27Clustering [Accuracy]55.1DepthG w/ 3D-LHP (ViT-S/8)
Semantic SegmentationCOCO-Stuff-27Clustering [mIoU]26.7DepthG w/ 3D-LHP (ViT-S/8)
Semantic SegmentationCOCO-Stuff-27Linear Classifier [Accuracy]73.9DepthG w/ 3D-LHP (ViT-S/8)
Semantic SegmentationCOCO-Stuff-27Linear Classifier [mIoU]37.8DepthG w/ 3D-LHP (ViT-S/8)
Semantic SegmentationCOCO-Stuff-27Clustering [Accuracy]56.3DepthG (ViT-S/8)
Semantic SegmentationCOCO-Stuff-27Clustering [mIoU]25.6DepthG (ViT-S/8)
Semantic SegmentationCOCO-Stuff-27Linear Classifier [Accuracy]73.7DepthG (ViT-S/8)
Semantic SegmentationCOCO-Stuff-27Linear Classifier [mIoU]38.9DepthG (ViT-S/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Clustering [Accuracy]58.6DepthG (ViT-B/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Clustering [mIoU]29DepthG (ViT-B/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Linear Classifier [Accuracy]75.5DepthG (ViT-B/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Linear Classifier [mIoU]41.6DepthG (ViT-B/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Clustering [Accuracy]55.1DepthG w/ 3D-LHP (ViT-S/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Clustering [mIoU]26.7DepthG w/ 3D-LHP (ViT-S/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Linear Classifier [Accuracy]73.9DepthG w/ 3D-LHP (ViT-S/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Linear Classifier [mIoU]37.8DepthG w/ 3D-LHP (ViT-S/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Clustering [Accuracy]56.3DepthG (ViT-S/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Clustering [mIoU]25.6DepthG (ViT-S/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Linear Classifier [Accuracy]73.7DepthG (ViT-S/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Linear Classifier [mIoU]38.9DepthG (ViT-S/8)
10-shot image generationCOCO-Stuff-27Clustering [Accuracy]58.6DepthG (ViT-B/8)
10-shot image generationCOCO-Stuff-27Clustering [mIoU]29DepthG (ViT-B/8)
10-shot image generationCOCO-Stuff-27Linear Classifier [Accuracy]75.5DepthG (ViT-B/8)
10-shot image generationCOCO-Stuff-27Linear Classifier [mIoU]41.6DepthG (ViT-B/8)
10-shot image generationCOCO-Stuff-27Clustering [Accuracy]55.1DepthG w/ 3D-LHP (ViT-S/8)
10-shot image generationCOCO-Stuff-27Clustering [mIoU]26.7DepthG w/ 3D-LHP (ViT-S/8)
10-shot image generationCOCO-Stuff-27Linear Classifier [Accuracy]73.9DepthG w/ 3D-LHP (ViT-S/8)
10-shot image generationCOCO-Stuff-27Linear Classifier [mIoU]37.8DepthG w/ 3D-LHP (ViT-S/8)
10-shot image generationCOCO-Stuff-27Clustering [Accuracy]56.3DepthG (ViT-S/8)
10-shot image generationCOCO-Stuff-27Clustering [mIoU]25.6DepthG (ViT-S/8)
10-shot image generationCOCO-Stuff-27Linear Classifier [Accuracy]73.7DepthG (ViT-S/8)
10-shot image generationCOCO-Stuff-27Linear Classifier [mIoU]38.9DepthG (ViT-S/8)
Unsupervised Panoptic SegmentationCityscapesPQ16.1DepthG + CutLER
2D Panoptic SegmentationCityscapesPQ16.1DepthG + CutLER

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15