TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MONet: Unsupervised Scene Decomposition and Representation

MONet: Unsupervised Scene Decomposition and Representation

Christopher P. Burgess, Loic Matthey, Nicholas Watters, Rishabh Kabra, Irina Higgins, Matt Botvinick, Alexander Lerchner

2019-01-22Object DiscoveryUnsupervised Object Segmentation
PaperPDFCodeCodeCodeCodeCode

Abstract

The ability to decompose scenes in terms of abstract building blocks is crucial for general intelligence. Where those basic building blocks share meaningful properties, interactions and other regularities across scenes, such decompositions can simplify reasoning and facilitate imagination of novel scenarios. In particular, representing perceptual observations in terms of entities should improve data efficiency and transfer performance on a wide range of tasks. Thus we need models capable of discovering useful decompositions of scenes by identifying units with such regularities and representing them in a common format. To address this problem, we have developed the Multi-Object Network (MONet). In this model, a VAE is trained end-to-end together with a recurrent attention network -- in a purely unsupervised manner -- to provide attention masks around, and reconstructions of, regions of images. We show that this model is capable of learning to decompose and represent challenging 3D scenes into semantically meaningful components, such as objects and background elements.

Related Papers

When Does Pruning Benefit Vision Representations?2025-07-02FORLA:Federated Object-centric Representation Learning with Slot Attention2025-06-03Binding threshold units with artificial oscillatory neurons2025-05-06Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning2025-05-04Object Learning and Robust 3D Reconstruction2025-04-22Are We Done with Object-Centric Learning?2025-04-09CTRL-O: Language-Controllable Object-Centric Visual Representation Learning2025-03-27xMOD: Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D motion2025-03-19