TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Self-Supervised Transformers for Unsupervised Object Disco...

Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut

Yangtao Wang, Xi Shen, Shell Hu, Yuan Yuan, James Crowley, Dominique Vaufreydaz

2022-02-23CVPR 2022 1Weakly Supervised Object DetectionUnsupervised Saliency DetectionObject DiscoverySingle-object discoveryWeakly-Supervised Object Localizationobject-detectionObject DetectionSaliency Detection
PaperPDFCode

Abstract

Transformers trained with self-supervised learning using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we demonstrate a graph-based approach that uses the self-supervised transformer features to discover an object from an image. Visual tokens are viewed as nodes in a weighted graph with edges representing a connectivity score based on the similarity of tokens. Foreground objects can then be segmented using a normalized graph-cut to group self-similar regions. We solve the graph-cut problem using spectral clustering with generalized eigen-decomposition and show that the second smallest eigenvector provides a cutting solution since its absolute value indicates the likelihood that a token belongs to a foreground object. Despite its simplicity, this approach significantly boosts the performance of unsupervised object discovery: we improve over the recent state of the art LOST by a margin of 6.9%, 8.1%, and 8.1% respectively on the VOC07, VOC12, and COCO20K. The performance can be further improved by adding a second stage class-agnostic detector (CAD). Our proposed method can be easily extended to unsupervised saliency detection and weakly supervised object detection. For unsupervised saliency detection, we improve IoU for 4.9%, 5.2%, 12.9% on ECSSD, DUTS, DUT-OMRON respectively compared to previous state of the art. For weakly supervised object detection, we achieve competitive performance on CUB and ImageNet.

Results

TaskDatasetMetricValueModel
Saliency DetectionECSSDAccuracy93.4TokenCut
Saliency DetectionECSSDIoU77.2TokenCut
Saliency DetectionECSSDmaximal F-measure87.4TokenCut
Saliency DetectionDUT-OMRONAccuracy89.7TokenCut
Saliency DetectionDUT-OMRONIoU61.8TokenCut
Saliency DetectionDUT-OMRONmaximal F-measure69.7TokenCut
Saliency DetectionDUTSAccuracy91.4TokenCut
Saliency DetectionDUTSIoU62.4TokenCut
Saliency DetectionDUTSmaximal F-measure75.5TokenCut
Object LocalizationImageNetGT-known localization accuracy65.4TokenCut
Object LocalizationImageNetTop-1 Localization Accuracy52.3TokenCut
Object LocalizationCUBTop-1 Localization Accuracy72.9TokenCut
Object Localization CUB-200-2011Top-1 Localization Accuracy72.9TokenCut
Single-object discoveryCOCO_20kCorLoc62.6TokenCut + CAD
Single-object discoveryCOCO_20kCorLoc58.8TokenCut

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07