Varun Nair, Javier Fuentes Alonso, Tony Beltramelli
Semi-Supervised Learning (SSL) algorithms have shown great potential in training regimes when access to labeled data is scarce but access to unlabeled data is plentiful. However, our experiments illustrate several shortcomings that prior SSL algorithms suffer from. In particular, poor performance when unlabeled and labeled data distributions differ. To address these observations, we develop RealMix, which achieves state-of-the-art results on standard benchmark datasets across different labeled and unlabeled set sizes while overcoming the aforementioned challenges. Notably, RealMix achieves an error rate of 9.79% on CIFAR10 with 250 labels and is the only SSL method tested able to surpass baseline performance when there is significant mismatch in the labeled and unlabeled data distributions. RealMix demonstrates how SSL can be used in real world situations with limited access to both data and compute and guides further research in SSL with practical applicability in mind.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Classification | CIFAR-10, 4000 Labels | Percentage error | 6.38 | RealMix |
| Image Classification | cifar10, 250 Labels | Percentage correct | 90.21 | RealMix |
| Image Classification | SVHN, 250 Labels | Accuracy | 96.47 | RealMix |
| Image Classification | CIFAR-10, 250 Labels | Percentage error | 7.6 | EnAET |
| Image Classification | CIFAR-10, 250 Labels | Percentage error | 9.79 | RealMix |
| Semi-Supervised Image Classification | CIFAR-10, 4000 Labels | Percentage error | 6.38 | RealMix |
| Semi-Supervised Image Classification | cifar10, 250 Labels | Percentage correct | 90.21 | RealMix |
| Semi-Supervised Image Classification | SVHN, 250 Labels | Accuracy | 96.47 | RealMix |
| Semi-Supervised Image Classification | CIFAR-10, 250 Labels | Percentage error | 7.6 | EnAET |
| Semi-Supervised Image Classification | CIFAR-10, 250 Labels | Percentage error | 9.79 | RealMix |