David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, Colin Raffel
Semi-supervised learning has proven to be a powerful paradigm for leveraging unlabeled data to mitigate the reliance on large labeled datasets. In this work, we unify the current dominant approaches for semi-supervised learning to produce a new algorithm, MixMatch, that works by guessing low-entropy labels for data-augmented unlabeled examples and mixing labeled and unlabeled data using MixUp. We show that MixMatch obtains state-of-the-art results by a large margin across many datasets and labeled data amounts. For example, on CIFAR-10 with 250 labels, we reduce error rate by a factor of 4 (from 38% to 11%) and by a factor of 2 on STL-10. We also demonstrate how MixMatch can help achieve a dramatically better accuracy-privacy trade-off for differential privacy. Finally, we perform an ablation study to tease apart which components of MixMatch are most important for its success.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Classification | CIFAR-10 | Percentage correct | 95.05 | MixMatch |
| Image Classification | CIFAR-100 | Percentage correct | 74.1 | MixMatch |
| Image Classification | STL-10 | Percentage correct | 94.41 | MixMatch |
| Image Classification | STL-10 | Percentage correct | 89.82 | MixMatch |
| Image Classification | STL-10 | Percentage correct | 87.36 | CutOut |
| Image Classification | SVHN | Percentage error | 2.59 | MixMatch |
| Image Classification | CIFAR-10, 4000 Labels | Percentage error | 6.24 | MixMatch |
| Image Classification | CIFAR-10, 2000 Labels | Accuracy | 92.97 | MixMatch |
| Image Classification | STL-10, 1000 Labels | Accuracy | 89.82 | MixMatch |
| Image Classification | SVHN, 500 Labels | Accuracy | 96.36 | MixMatch |
| Image Classification | SVHN, 2000 Labels | Accuracy | 96.96 | MixMatch |
| Image Classification | CIFAR-10, 1000 Labels | Accuracy | 92.25 | MixMatch |
| Image Classification | CIFAR-10, 500 Labels | Accuracy | 91.35 | MixMatch |
| Image Classification | SVHN, 4000 Labels | Accuracy | 97.11 | MixMatch |
| Image Classification | SVHN, 1000 labels | Accuracy | 96.73 | MixMatch |
| Image Classification | STL-10, 5000 Labels | Accuracy | 94.41 | MixMatch |
| Image Classification | SVHN, 250 Labels | Accuracy | 96.22 | MixMatch |
| Image Classification | CIFAR-10, 250 Labels | Percentage error | 11.08 | MixMatch |
| Semi-Supervised Image Classification | CIFAR-10, 4000 Labels | Percentage error | 6.24 | MixMatch |
| Semi-Supervised Image Classification | CIFAR-10, 2000 Labels | Accuracy | 92.97 | MixMatch |
| Semi-Supervised Image Classification | STL-10, 1000 Labels | Accuracy | 89.82 | MixMatch |
| Semi-Supervised Image Classification | SVHN, 500 Labels | Accuracy | 96.36 | MixMatch |
| Semi-Supervised Image Classification | SVHN, 2000 Labels | Accuracy | 96.96 | MixMatch |
| Semi-Supervised Image Classification | CIFAR-10, 1000 Labels | Accuracy | 92.25 | MixMatch |
| Semi-Supervised Image Classification | CIFAR-10, 500 Labels | Accuracy | 91.35 | MixMatch |
| Semi-Supervised Image Classification | SVHN, 4000 Labels | Accuracy | 97.11 | MixMatch |
| Semi-Supervised Image Classification | SVHN, 1000 labels | Accuracy | 96.73 | MixMatch |
| Semi-Supervised Image Classification | STL-10, 5000 Labels | Accuracy | 94.41 | MixMatch |
| Semi-Supervised Image Classification | SVHN, 250 Labels | Accuracy | 96.22 | MixMatch |
| Semi-Supervised Image Classification | CIFAR-10, 250 Labels | Percentage error | 11.08 | MixMatch |