TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/GENESIS-V2: Inferring Unordered Object Representations wit...

GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement

Martin Engelcke, Oiwi Parker Jones, Ingmar Posner

2021-04-20NeurIPS 2021 12Unsupervised Image SegmentationRepresentation LearningScene GenerationClusteringImage GenerationUnsupervised Object SegmentationImage Segmentation
PaperPDFCodeCode(official)

Abstract

Advances in unsupervised learning of object-representations have culminated in the development of a broad range of methods for unsupervised object segmentation and interpretable object-centric scene generation. These methods, however, are limited to simulated and real-world datasets with limited visual complexity. Moreover, object representations are often inferred using RNNs which do not scale well to large images or iterative refinement which avoids imposing an unnatural ordering on objects in an image but requires the a priori initialisation of a fixed number of object representations. In contrast to established paradigms, this work proposes an embedding-based approach in which embeddings of pixels are clustered in a differentiable fashion using a stochastic stick-breaking process. Similar to iterative refinement, this clustering procedure also leads to randomly ordered object representations, but without the need of initialising a fixed number of clusters a priori. This is used to develop a new model, GENESIS-v2, which can infer a variable number of object representations without using RNNs or iterative refinement. We show that GENESIS-v2 performs strongly in comparison to recent baselines in terms of unsupervised image segmentation and object-centric scene generation on established synthetic datasets as well as more complex real-world datasets.

Results

TaskDatasetMetricValueModel
Image GenerationShapeStacksFID112.7GENESIS-V2
Image GenerationShapeStacksFID186.8GENESIS
Image GenerationShapeStacksFID197.8MONET-G
Image GenerationObjectsRoomFID52.6GENESIS-V2
Image GenerationObjectsRoomFID62.8GENESIS
Image GenerationObjectsRoomFID205.7MONET-G
Instance SegmentationShelf&Tote Training DatasetARI0.55GENESIS-V2
Instance SegmentationShelf&Tote Training DatasetARI0.11MONET-G
Instance SegmentationShelf&Tote Training DatasetARI0.04GENESIS
Instance SegmentationShelf&Tote Training DatasetARI0.03SlotAttention
Instance SegmentationShapeStacksARI-FG0.81GENESIS-V2
Instance SegmentationShapeStacksARI-FG0.76SlotAttention
Instance SegmentationShapeStacksARI-FG0.7GENESIS
Instance SegmentationShapeStacksARI-FG0.7MONET-G
Instance SegmentationObjectsRoomARI-FG0.84GENESIS-V2
Instance SegmentationObjectsRoomARI-FG0.79SlotAttention
Instance SegmentationObjectsRoomARI-FG0.63GENESIS
Instance SegmentationObjectsRoomARI-FG0.54MONET-G
Unsupervised Object SegmentationShelf&Tote Training DatasetARI0.55GENESIS-V2
Unsupervised Object SegmentationShelf&Tote Training DatasetARI0.11MONET-G
Unsupervised Object SegmentationShelf&Tote Training DatasetARI0.04GENESIS
Unsupervised Object SegmentationShelf&Tote Training DatasetARI0.03SlotAttention
Unsupervised Object SegmentationShapeStacksARI-FG0.81GENESIS-V2
Unsupervised Object SegmentationShapeStacksARI-FG0.76SlotAttention
Unsupervised Object SegmentationShapeStacksARI-FG0.7GENESIS
Unsupervised Object SegmentationShapeStacksARI-FG0.7MONET-G
Unsupervised Object SegmentationObjectsRoomARI-FG0.84GENESIS-V2
Unsupervised Object SegmentationObjectsRoomARI-FG0.79SlotAttention
Unsupervised Object SegmentationObjectsRoomARI-FG0.63GENESIS
Unsupervised Object SegmentationObjectsRoomARI-FG0.54MONET-G

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Tri-Learn Graph Fusion Network for Attributed Graph Clustering2025-07-18Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17