TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/EAGLE: Eigen Aggregation Learning for Object-Centric Unsup...

EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation

Chanyoung Kim, Woojung Han, Dayun Ju, Seong Jae Hwang

2024-03-03CVPR 2024 1Representation LearningUnsupervised Semantic SegmentationSegmentationSemantic SegmentationSemantic SimilaritySemantic Textual Similarity
PaperPDFCode(official)

Abstract

Semantic segmentation has innately relied on extensive pixel-level annotated data, leading to the emergence of unsupervised methodologies. Among them, leveraging self-supervised Vision Transformers for unsupervised semantic segmentation (USS) has been making steady progress with expressive deep features. Yet, for semantically segmenting images with complex objects, a predominant challenge remains: the lack of explicit object-level semantic encoding in patch-level features. This technical limitation often leads to inadequate segmentation of complex objects with diverse structures. To address this gap, we present a novel approach, EAGLE, which emphasizes object-centric representation learning for unsupervised semantic segmentation. Specifically, we introduce EiCue, a spectral technique providing semantic and structural cues through an eigenbasis derived from the semantic similarity matrix of deep image features and color affinity from an image. Further, by incorporating our object-centric contrastive loss with EiCue, we guide our model to learn object-level representations with intra- and inter-image object-feature consistency, thereby enhancing semantic accuracy. Extensive experiments on COCO-Stuff, Cityscapes, and Potsdam-3 datasets demonstrate the state-of-the-art USS results of EAGLE with accurate and consistent semantic segmentation across complex scenes.

Results

TaskDatasetMetricValueModel
Semantic SegmentationPotsdam-3Accuracy83.3EAGLE (DINO, ViT-B/8)
Semantic SegmentationPotsdam-3mIoU71.1EAGLE (DINO, ViT-B/8)
Semantic SegmentationCityscapes testAccuracy79.4EAGLE (DINO, ViT-B/8)
Semantic SegmentationCityscapes testmIoU22.1EAGLE (DINO, ViT-B/8)
Semantic SegmentationCityscapes testAccuracy81.8EAGLE (DINO, ViT-S/8)
Semantic SegmentationCityscapes testmIoU19.7EAGLE (DINO, ViT-S/8)
Semantic SegmentationCOCO-Stuff-27Clustering [Accuracy]64.2EAGLE (DINO, ViT-S/8)
Semantic SegmentationCOCO-Stuff-27Clustering [mIoU]27.2EAGLE (DINO, ViT-S/8)
Semantic SegmentationCOCO-Stuff-27Linear Classifier [Accuracy]76.8EAGLE (DINO, ViT-S/8)
Semantic SegmentationCOCO-Stuff-27Linear Classifier [mIoU]43.9EAGLE (DINO, ViT-S/8)
Unsupervised Semantic SegmentationPotsdam-3Accuracy83.3EAGLE (DINO, ViT-B/8)
Unsupervised Semantic SegmentationPotsdam-3mIoU71.1EAGLE (DINO, ViT-B/8)
Unsupervised Semantic SegmentationCityscapes testAccuracy79.4EAGLE (DINO, ViT-B/8)
Unsupervised Semantic SegmentationCityscapes testmIoU22.1EAGLE (DINO, ViT-B/8)
Unsupervised Semantic SegmentationCityscapes testAccuracy81.8EAGLE (DINO, ViT-S/8)
Unsupervised Semantic SegmentationCityscapes testmIoU19.7EAGLE (DINO, ViT-S/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Clustering [Accuracy]64.2EAGLE (DINO, ViT-S/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Clustering [mIoU]27.2EAGLE (DINO, ViT-S/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Linear Classifier [Accuracy]76.8EAGLE (DINO, ViT-S/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Linear Classifier [mIoU]43.9EAGLE (DINO, ViT-S/8)
10-shot image generationPotsdam-3Accuracy83.3EAGLE (DINO, ViT-B/8)
10-shot image generationPotsdam-3mIoU71.1EAGLE (DINO, ViT-B/8)
10-shot image generationCityscapes testAccuracy79.4EAGLE (DINO, ViT-B/8)
10-shot image generationCityscapes testmIoU22.1EAGLE (DINO, ViT-B/8)
10-shot image generationCityscapes testAccuracy81.8EAGLE (DINO, ViT-S/8)
10-shot image generationCityscapes testmIoU19.7EAGLE (DINO, ViT-S/8)
10-shot image generationCOCO-Stuff-27Clustering [Accuracy]64.2EAGLE (DINO, ViT-S/8)
10-shot image generationCOCO-Stuff-27Clustering [mIoU]27.2EAGLE (DINO, ViT-S/8)
10-shot image generationCOCO-Stuff-27Linear Classifier [Accuracy]76.8EAGLE (DINO, ViT-S/8)
10-shot image generationCOCO-Stuff-27Linear Classifier [mIoU]43.9EAGLE (DINO, ViT-S/8)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17