EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation

Chanyoung Kim, Woojung Han, Dayun Ju, Seong Jae Hwang

2024-03-03CVPR 2024 1Representation Learning Unsupervised Semantic Segmentation Segmentation Semantic Segmentation Semantic Similarity Semantic Textual Similarity

Paper PDF Code(official)

Abstract

Semantic segmentation has innately relied on extensive pixel-level annotated data, leading to the emergence of unsupervised methodologies. Among them, leveraging self-supervised Vision Transformers for unsupervised semantic segmentation (USS) has been making steady progress with expressive deep features. Yet, for semantically segmenting images with complex objects, a predominant challenge remains: the lack of explicit object-level semantic encoding in patch-level features. This technical limitation often leads to inadequate segmentation of complex objects with diverse structures. To address this gap, we present a novel approach, EAGLE, which emphasizes object-centric representation learning for unsupervised semantic segmentation. Specifically, we introduce EiCue, a spectral technique providing semantic and structural cues through an eigenbasis derived from the semantic similarity matrix of deep image features and color affinity from an image. Further, by incorporating our object-centric contrastive loss with EiCue, we guide our model to learn object-level representations with intra- and inter-image object-feature consistency, thereby enhancing semantic accuracy. Extensive experiments on COCO-Stuff, Cityscapes, and Potsdam-3 datasets demonstrate the state-of-the-art USS results of EAGLE with accurate and consistent semantic segmentation across complex scenes.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	Potsdam-3	Accuracy	83.3	EAGLE (DINO, ViT-B/8)
Semantic Segmentation	Potsdam-3	mIoU	71.1	EAGLE (DINO, ViT-B/8)
Semantic Segmentation	Cityscapes test	Accuracy	79.4	EAGLE (DINO, ViT-B/8)
Semantic Segmentation	Cityscapes test	mIoU	22.1	EAGLE (DINO, ViT-B/8)
Semantic Segmentation	Cityscapes test	Accuracy	81.8	EAGLE (DINO, ViT-S/8)
Semantic Segmentation	Cityscapes test	mIoU	19.7	EAGLE (DINO, ViT-S/8)
Semantic Segmentation	COCO-Stuff-27	Clustering [Accuracy]	64.2	EAGLE (DINO, ViT-S/8)
Semantic Segmentation	COCO-Stuff-27	Clustering [mIoU]	27.2	EAGLE (DINO, ViT-S/8)
Semantic Segmentation	COCO-Stuff-27	Linear Classifier [Accuracy]	76.8	EAGLE (DINO, ViT-S/8)
Semantic Segmentation	COCO-Stuff-27	Linear Classifier [mIoU]	43.9	EAGLE (DINO, ViT-S/8)
Unsupervised Semantic Segmentation	Potsdam-3	Accuracy	83.3	EAGLE (DINO, ViT-B/8)
Unsupervised Semantic Segmentation	Potsdam-3	mIoU	71.1	EAGLE (DINO, ViT-B/8)
Unsupervised Semantic Segmentation	Cityscapes test	Accuracy	79.4	EAGLE (DINO, ViT-B/8)
Unsupervised Semantic Segmentation	Cityscapes test	mIoU	22.1	EAGLE (DINO, ViT-B/8)
Unsupervised Semantic Segmentation	Cityscapes test	Accuracy	81.8	EAGLE (DINO, ViT-S/8)
Unsupervised Semantic Segmentation	Cityscapes test	mIoU	19.7	EAGLE (DINO, ViT-S/8)
Unsupervised Semantic Segmentation	COCO-Stuff-27	Clustering [Accuracy]	64.2	EAGLE (DINO, ViT-S/8)
Unsupervised Semantic Segmentation	COCO-Stuff-27	Clustering [mIoU]	27.2	EAGLE (DINO, ViT-S/8)
Unsupervised Semantic Segmentation	COCO-Stuff-27	Linear Classifier [Accuracy]	76.8	EAGLE (DINO, ViT-S/8)
Unsupervised Semantic Segmentation	COCO-Stuff-27	Linear Classifier [mIoU]	43.9	EAGLE (DINO, ViT-S/8)
10-shot image generation	Potsdam-3	Accuracy	83.3	EAGLE (DINO, ViT-B/8)
10-shot image generation	Potsdam-3	mIoU	71.1	EAGLE (DINO, ViT-B/8)
10-shot image generation	Cityscapes test	Accuracy	79.4	EAGLE (DINO, ViT-B/8)
10-shot image generation	Cityscapes test	mIoU	22.1	EAGLE (DINO, ViT-B/8)
10-shot image generation	Cityscapes test	Accuracy	81.8	EAGLE (DINO, ViT-S/8)
10-shot image generation	Cityscapes test	mIoU	19.7	EAGLE (DINO, ViT-S/8)
10-shot image generation	COCO-Stuff-27	Clustering [Accuracy]	64.2	EAGLE (DINO, ViT-S/8)
10-shot image generation	COCO-Stuff-27	Clustering [mIoU]	27.2	EAGLE (DINO, ViT-S/8)
10-shot image generation	COCO-Stuff-27	Linear Classifier [Accuracy]	76.8	EAGLE (DINO, ViT-S/8)
10-shot image generation	COCO-Stuff-27	Linear Classifier [mIoU]	43.9	EAGLE (DINO, ViT-S/8)

EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation

Abstract

Results

Related Papers

EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation

Abstract

Results

Related Papers