TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SERE: Exploring Feature Self-relation for Self-supervised ...

SERE: Exploring Feature Self-relation for Self-supervised Transformer

Zhong-Yu Li, ShangHua Gao, Ming-Ming Cheng

2022-06-10Self-Supervised LearningUnsupervised Semantic SegmentationSemantic Segmentation
PaperPDFCode(official)

Abstract

Learning representations with self-supervision for convolutional networks (CNN) has been validated to be effective for vision tasks. As an alternative to CNN, vision transformers (ViT) have strong representation ability with spatial self-attention and channel-level feedforward networks. Recent works reveal that self-supervised learning helps unleash the great potential of ViT. Still, most works follow self-supervised strategies designed for CNN, e.g., instance-level discrimination of samples, but they ignore the properties of ViT. We observe that relational modeling on spatial and channel dimensions distinguishes ViT from other networks. To enforce this property, we explore the feature SElf-RElation (SERE) for training self-supervised ViT. Specifically, instead of conducting self-supervised learning solely on feature embeddings from multiple views, we utilize the feature self-relations, i.e., spatial/channel self-relations, for self-supervised learning. Self-relation based learning further enhances the relation modeling ability of ViT, resulting in stronger representations that stably improve performance on multiple downstream tasks. Our source code is publicly available at: https://github.com/MCG-NKU/SERE.

Results

TaskDatasetMetricValueModel
Semantic SegmentationImageNet-SmIoU (test)63.3SERE (ViT-B/16, 100ep, 224x224, SSL+FT)
Semantic SegmentationImageNet-SmIoU (val)63SERE (ViT-B/16, 100ep, 224x224, SSL+FT)
Semantic SegmentationImageNet-SmIoU (test)59SERE (ViT-S/16, 100ep, 224x224, SSL+FT, mmseg)
Semantic SegmentationImageNet-SmIoU (val)59.4SERE (ViT-S/16, 100ep, 224x224, SSL+FT, mmseg)
Semantic SegmentationImageNet-SmIoU (test)57.8SERE (ViT-S/16, 100ep, 224x224, SSL+FT)
Semantic SegmentationImageNet-SmIoU (val)58.9SERE (ViT-S/16, 100ep, 224x224, SSL+FT)
Semantic SegmentationImageNet-SmIoU (test)48.2SERE (ViT-B/16, 100ep, 224x224, SSL)
Semantic SegmentationImageNet-SmIoU (val)48.6SERE (ViT-B/16, 100ep, 224x224, SSL)
Semantic SegmentationImageNet-SmIoU (test)40.5SERE (ViT-S/16, 100ep, 224x224, SSL, mmseg)
Semantic SegmentationImageNet-SmIoU (val)41SERE (ViT-S/16, 100ep, 224x224, SSL, mmseg)
Semantic SegmentationImageNet-SmIoU (test)40.2SERE (ViT-S/16, 100ep, 224x224, SSL)
Semantic SegmentationImageNet-SmIoU (val)41SERE (ViT-S/16, 100ep, 224x224, SSL)
10-shot image generationImageNet-SmIoU (test)63.3SERE (ViT-B/16, 100ep, 224x224, SSL+FT)
10-shot image generationImageNet-SmIoU (val)63SERE (ViT-B/16, 100ep, 224x224, SSL+FT)
10-shot image generationImageNet-SmIoU (test)59SERE (ViT-S/16, 100ep, 224x224, SSL+FT, mmseg)
10-shot image generationImageNet-SmIoU (val)59.4SERE (ViT-S/16, 100ep, 224x224, SSL+FT, mmseg)
10-shot image generationImageNet-SmIoU (test)57.8SERE (ViT-S/16, 100ep, 224x224, SSL+FT)
10-shot image generationImageNet-SmIoU (val)58.9SERE (ViT-S/16, 100ep, 224x224, SSL+FT)
10-shot image generationImageNet-SmIoU (test)48.2SERE (ViT-B/16, 100ep, 224x224, SSL)
10-shot image generationImageNet-SmIoU (val)48.6SERE (ViT-B/16, 100ep, 224x224, SSL)
10-shot image generationImageNet-SmIoU (test)40.5SERE (ViT-S/16, 100ep, 224x224, SSL, mmseg)
10-shot image generationImageNet-SmIoU (val)41SERE (ViT-S/16, 100ep, 224x224, SSL, mmseg)
10-shot image generationImageNet-SmIoU (test)40.2SERE (ViT-S/16, 100ep, 224x224, SSL)
10-shot image generationImageNet-SmIoU (val)41SERE (ViT-S/16, 100ep, 224x224, SSL)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15