Unsupervised Multi-object Segmentation Using Attention and Soft-argmax

Bruno Sauvalle, Arnaud de La Fortelle

2022-05-26Representation Learning Semantic Segmentation Translation object-detection Unsupervised Object Segmentation Object Detection

Paper PDF Code(official)

Abstract

We introduce a new architecture for unsupervised object-centric representation learning and multi-object detection and segmentation, which uses a translation-equivariant attention mechanism to predict the coordinates of the objects present in the scene and to associate a feature vector to each object. A transformer encoder handles occlusions and redundant detections, and a convolutional autoencoder is in charge of background reconstruction. We show that this architecture significantly outperforms the state of the art on complex synthetic benchmarks.

Results

Task	Dataset	Metric	Value	Model
Instance Segmentation	ShapeStacks	ARI-FG	0.82	AST
Instance Segmentation	ObjectsRoom	ARI-FG	0.87	AST
Unsupervised Object Segmentation	ShapeStacks	ARI-FG	0.82	AST
Unsupervised Object Segmentation	ObjectsRoom	ARI-FG	0.87	AST

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20 Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17 Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17 DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17 SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17 Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17 A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17