FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery

Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee

2018-11-27CVPR 2019 6Fine-Grained Visual Categorization Disentanglement Image Clustering Conditional Image Generation

Abstract

We propose FineGAN, a novel unsupervised GAN framework, which disentangles the background, object shape, and object appearance to hierarchically generate images of fine-grained object categories. To disentangle the factors without supervision, our key idea is to use information theory to associate each factor to a latent code, and to condition the relationships between the codes in a specific way to induce the desired hierarchy. Through extensive experiments, we show that FineGAN achieves the desired disentanglement to generate realistic and diverse images belonging to fine-grained classes of birds, dogs, and cars. Using FineGAN's automatically learned features, we also cluster real images as a first attempt at solving the novel problem of unsupervised fine-grained object category discovery. Our code/models/demo can be found at https://github.com/kkanshul/finegan

Results

Task	Dataset	Metric	Value	Model
Image Generation	CUB 128 x 128	FID	11.25	FineGAN
Image Generation	CUB 128 x 128	Inception score	52.53	FineGAN
Image Generation	Stanford Cars	FID	16.03	FineGAN
Image Generation	Stanford Cars	Inception score	32.62	FineGAN
Image Generation	Stanford Dogs	FID	25.66	FineGAN
Image Generation	Stanford Dogs	Inception score	46.92	FineGAN
Image Clustering	Stanford Cars	Accuracy	0.078	FineGAN
Image Clustering	Stanford Cars	NMI	0.354	FineGAN
Image Clustering	Stanford Dogs	Accuracy	0.079	FineGAN
Image Clustering	Stanford Dogs	NMI	0.233	FineGAN
Image Clustering	CUB Birds	Accuracy	0.126	FineGAN
Image Clustering	CUB Birds	NMI	0.403	FineGAN

Related Papers

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models2025-07-18 Towards Imperceptible JPEG Image Hiding: Multi-range Representations-driven Adversarial Stego Generation2025-07-11 Generative Head-Mounted Camera Captures for Photorealistic Avatars2025-07-08 Reflections Unlock: Geometry-Aware Reflection Disentanglement in 3D Gaussian Splatting for Photorealistic Scenes Rendering2025-07-08 Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations2025-07-04 Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation2025-07-04 Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization2025-07-03 SemFaceEdit: Semantic Face Editing on Generative Radiance Manifolds2025-06-28