InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel

2016-06-12NeurIPS 2016 12Representation Learning Unsupervised MNIST Image Generation Unsupervised Image Classification

Paper PDF Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code

Abstract

This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset. Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.

Results

Task	Dataset	Metric	Value	Model
Image Generation	CUB 128 x 128	FID	13.2	InfoGAN
Image Generation	CUB 128 x 128	Inception score	47.32	InfoGAN
Image Generation	Stanford Cars	FID	17.63	InfoGAN
Image Generation	Stanford Cars	Inception score	28.62	InfoGAN
Image Generation	Stanford Dogs	FID	29.34	InfoGAN
Image Generation	Stanford Dogs	Inception score	43.16	InfoGAN
Image Classification	MNIST	Accuracy	95	InfoGAN

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20 Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17 Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17 fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17 Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17 FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17 A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17 Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17