TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ViCE: Improving Dense Representation Learning by Superpixe...

ViCE: Improving Dense Representation Learning by Superpixelization and Contrasting Cluster Assignment

Robin Karlsson, Tomoki Hayashi, Keisuke Fujii, Alexander Carballo, Kento Ohtani, Kazuya Takeda

2021-11-24Representation LearningSelf-Supervised LearningUnsupervised Semantic SegmentationDomain GeneralizationSemantic SegmentationWord EmbeddingsContrastive Learning
PaperPDFCode(official)

Abstract

Recent self-supervised models have demonstrated equal or better performance than supervised methods, opening for AI systems to learn visual representations from practically unlimited data. However, these methods are typically classification-based and thus ineffective for learning high-resolution feature maps that preserve precise spatial information. This work introduces superpixels to improve self-supervised learning of dense semantically rich visual concept embeddings. Decomposing images into a small set of visually coherent regions reduces the computational complexity by $\mathcal{O}(1000)$ while preserving detail. We experimentally show that contrasting over regions improves the effectiveness of contrastive learning methods, extends their applicability to high-resolution images, improves overclustering performance, superpixels are better than grids, and regional masking improves performance. The expressiveness of our dense embeddings is demonstrated by improving the SOTA unsupervised semantic segmentation benchmark on Cityscapes, and for convolutional models on COCO.

Results

TaskDatasetMetricValueModel
Semantic SegmentationCityscapes testAccuracy84.3ViCE
Semantic SegmentationCityscapes testmIoU25.2ViCE
Semantic SegmentationCOCO-Stuff-27Clustering [Accuracy]64.8ViCE
Semantic SegmentationCOCO-Stuff-27Clustering [mIoU]21.77ViCE
Unsupervised Semantic SegmentationCityscapes testAccuracy84.3ViCE
Unsupervised Semantic SegmentationCityscapes testmIoU25.2ViCE
Unsupervised Semantic SegmentationCOCO-Stuff-27Clustering [Accuracy]64.8ViCE
Unsupervised Semantic SegmentationCOCO-Stuff-27Clustering [mIoU]21.77ViCE
10-shot image generationCityscapes testAccuracy84.3ViCE
10-shot image generationCityscapes testmIoU25.2ViCE
10-shot image generationCOCO-Stuff-27Clustering [Accuracy]64.8ViCE
10-shot image generationCOCO-Stuff-27Clustering [mIoU]21.77ViCE

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17