Andrew Stirn, David A. Knowles
Current clustering priors for deep latent variable models (DLVMs) require defining the number of clusters a-priori and are susceptible to poor initializations. Addressing these deficiencies could greatly benefit deep learning-based scRNA-seq analysis by performing integration and clustering simultaneously. We adapt the VampPrior (Tomczak & Welling, 2018) into a Dirichlet process Gaussian mixture model, resulting in the VampPrior Mixture Model (VMM), a novel prior for DLVMs. We propose an inference procedure that alternates between variational inference and Empirical Bayes to cleanly distinguish variational and prior parameters. Using the VMM in a Variational Autoencoder attains highly competitive clustering performance on benchmark datasets. Augmenting scVI (Lopez et al., 2018), a popular scRNA-seq integration method, with the VMM significantly improves its performance and automatically arranges cells into biologically meaningful clusters.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Clustering | Fashion-MNIST | Accuracy | 0.716 | VMM |
| Image Clustering | Fashion-MNIST | NMI | 0.71 | VMM |
| Image Clustering | MNIST-full | Accuracy | 0.967 | VMM |
| Image Clustering | MNIST-full | NMI | 0.92 | VMM |
| Image Classification | MNIST | Accuracy | 96.74 | VMM |