Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut

Yangtao Wang, Xi Shen, Shell Hu, Yuan Yuan, James Crowley, Dominique Vaufreydaz

2022-02-23CVPR 2022 1Weakly Supervised Object Detection Unsupervised Saliency Detection Object Discovery Single-object discovery Weakly-Supervised Object Localization object-detection Object Detection Saliency Detection

Paper PDF Code

Abstract

Transformers trained with self-supervised learning using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we demonstrate a graph-based approach that uses the self-supervised transformer features to discover an object from an image. Visual tokens are viewed as nodes in a weighted graph with edges representing a connectivity score based on the similarity of tokens. Foreground objects can then be segmented using a normalized graph-cut to group self-similar regions. We solve the graph-cut problem using spectral clustering with generalized eigen-decomposition and show that the second smallest eigenvector provides a cutting solution since its absolute value indicates the likelihood that a token belongs to a foreground object. Despite its simplicity, this approach significantly boosts the performance of unsupervised object discovery: we improve over the recent state of the art LOST by a margin of 6.9%, 8.1%, and 8.1% respectively on the VOC07, VOC12, and COCO20K. The performance can be further improved by adding a second stage class-agnostic detector (CAD). Our proposed method can be easily extended to unsupervised saliency detection and weakly supervised object detection. For unsupervised saliency detection, we improve IoU for 4.9%, 5.2%, 12.9% on ECSSD, DUTS, DUT-OMRON respectively compared to previous state of the art. For weakly supervised object detection, we achieve competitive performance on CUB and ImageNet.

Results

Task	Dataset	Metric	Value	Model
Saliency Detection	ECSSD	Accuracy	93.4	TokenCut
Saliency Detection	ECSSD	IoU	77.2	TokenCut
Saliency Detection	ECSSD	maximal F-measure	87.4	TokenCut
Saliency Detection	DUT-OMRON	Accuracy	89.7	TokenCut
Saliency Detection	DUT-OMRON	IoU	61.8	TokenCut
Saliency Detection	DUT-OMRON	maximal F-measure	69.7	TokenCut
Saliency Detection	DUTS	Accuracy	91.4	TokenCut
Saliency Detection	DUTS	IoU	62.4	TokenCut
Saliency Detection	DUTS	maximal F-measure	75.5	TokenCut
Object Localization	ImageNet	GT-known localization accuracy	65.4	TokenCut
Object Localization	ImageNet	Top-1 Localization Accuracy	52.3	TokenCut
Object Localization	CUB	Top-1 Localization Accuracy	72.9	TokenCut
Object Localization	CUB-200-2011	Top-1 Localization Accuracy	72.9	TokenCut
Single-object discovery	COCO_20k	CorLoc	62.6	TokenCut + CAD
Single-object discovery	COCO_20k	CorLoc	58.8	TokenCut

Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut

Abstract

Results

Related Papers

Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut

Abstract

Results

Related Papers