ViCE: Improving Dense Representation Learning by Superpixelization and Contrasting Cluster Assignment

Robin Karlsson, Tomoki Hayashi, Keisuke Fujii, Alexander Carballo, Kento Ohtani, Kazuya Takeda

2021-11-24Representation Learning Self-Supervised Learning Unsupervised Semantic Segmentation Domain Generalization Semantic Segmentation Word Embeddings Contrastive Learning

Paper PDF Code(official)

Abstract

Recent self-supervised models have demonstrated equal or better performance than supervised methods, opening for AI systems to learn visual representations from practically unlimited data. However, these methods are typically classification-based and thus ineffective for learning high-resolution feature maps that preserve precise spatial information. This work introduces superpixels to improve self-supervised learning of dense semantically rich visual concept embeddings. Decomposing images into a small set of visually coherent regions reduces the computational complexity by $\mathcal{O}(1000)$ while preserving detail. We experimentally show that contrasting over regions improves the effectiveness of contrastive learning methods, extends their applicability to high-resolution images, improves overclustering performance, superpixels are better than grids, and regional masking improves performance. The expressiveness of our dense embeddings is demonstrated by improving the SOTA unsupervised semantic segmentation benchmark on Cityscapes, and for convolutional models on COCO.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	Cityscapes test	Accuracy	84.3	ViCE
Semantic Segmentation	Cityscapes test	mIoU	25.2	ViCE
Semantic Segmentation	COCO-Stuff-27	Clustering [Accuracy]	64.8	ViCE
Semantic Segmentation	COCO-Stuff-27	Clustering [mIoU]	21.77	ViCE
Unsupervised Semantic Segmentation	Cityscapes test	Accuracy	84.3	ViCE
Unsupervised Semantic Segmentation	Cityscapes test	mIoU	25.2	ViCE
Unsupervised Semantic Segmentation	COCO-Stuff-27	Clustering [Accuracy]	64.8	ViCE
Unsupervised Semantic Segmentation	COCO-Stuff-27	Clustering [mIoU]	21.77	ViCE
10-shot image generation	Cityscapes test	Accuracy	84.3	ViCE
10-shot image generation	Cityscapes test	mIoU	25.2	ViCE
10-shot image generation	COCO-Stuff-27	Clustering [Accuracy]	64.8	ViCE
10-shot image generation	COCO-Stuff-27	Clustering [mIoU]	21.77	ViCE

ViCE: Improving Dense Representation Learning by Superpixelization and Contrasting Cluster Assignment

Abstract

Results

Related Papers

ViCE: Improving Dense Representation Learning by Superpixelization and Contrasting Cluster Assignment

Abstract

Results

Related Papers