Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee
We propose FineGAN, a novel unsupervised GAN framework, which disentangles the background, object shape, and object appearance to hierarchically generate images of fine-grained object categories. To disentangle the factors without supervision, our key idea is to use information theory to associate each factor to a latent code, and to condition the relationships between the codes in a specific way to induce the desired hierarchy. Through extensive experiments, we show that FineGAN achieves the desired disentanglement to generate realistic and diverse images belonging to fine-grained classes of birds, dogs, and cars. Using FineGAN's automatically learned features, we also cluster real images as a first attempt at solving the novel problem of unsupervised fine-grained object category discovery. Our code/models/demo can be found at https://github.com/kkanshul/finegan
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Generation | CUB 128 x 128 | FID | 11.25 | FineGAN |
| Image Generation | CUB 128 x 128 | Inception score | 52.53 | FineGAN |
| Image Generation | Stanford Cars | FID | 16.03 | FineGAN |
| Image Generation | Stanford Cars | Inception score | 32.62 | FineGAN |
| Image Generation | Stanford Dogs | FID | 25.66 | FineGAN |
| Image Generation | Stanford Dogs | Inception score | 46.92 | FineGAN |
| Image Clustering | Stanford Cars | Accuracy | 0.078 | FineGAN |
| Image Clustering | Stanford Cars | NMI | 0.354 | FineGAN |
| Image Clustering | Stanford Dogs | Accuracy | 0.079 | FineGAN |
| Image Clustering | Stanford Dogs | NMI | 0.233 | FineGAN |
| Image Clustering | CUB Birds | Accuracy | 0.126 | FineGAN |
| Image Clustering | CUB Birds | NMI | 0.403 | FineGAN |