Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, Colin Raffel
Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance. In this paper, we demonstrate the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling. Our algorithm, FixMatch, first generates pseudo-labels using the model's predictions on weakly-augmented unlabeled images. For a given image, the pseudo-label is only retained if the model produces a high-confidence prediction. The model is then trained to predict the pseudo-label when fed a strongly-augmented version of the same image. Despite its simplicity, we show that FixMatch achieves state-of-the-art performance across a variety of standard semi-supervised learning benchmarks, including 94.93% accuracy on CIFAR-10 with 250 labels and 88.61% accuracy with 40 -- just 4 labels per class. Since FixMatch bears many similarities to existing SSL methods that achieve worse performance, we carry out an extensive ablation study to tease apart the experimental factors that are most important to FixMatch's success. We make our code available at https://github.com/google-research/fixmatch.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Classification | STL-10 | Percentage correct | 94.83 | FixMatch (CTA) |
| Image Classification | STL-10 | Percentage correct | 94.77 | ReMixMatch |
| Image Classification | STL-10 | Percentage correct | 92.34 | UDA |
| Image Classification | STL-10 | Percentage correct | 92.02 | FixMatch (RA) |
| Image Classification | STL-10 | Percentage correct | 89.59 | MixMatch |
| Image Classification | STL-10 | Percentage correct | 78.57 | Mean Teacher |
| Image Classification | STL-10 | Percentage correct | 73.77 | Π-Model |
| Image Classification | STL-10 | Percentage correct | 72.01 | Pseudo-Labeling |
| Image Classification | CIFAR-10, 4000 Labels | Percentage error | 4.31 | FixMatch (CTA) |
| Image Classification | CIFAR-10, 400 Labels (OpenSet, 6/4) | Accuracy | 83.7 | FixMatch |
| Image Classification | cifar-100, 10000 Labels | Percentage error | 22.6 | FixMatch (RA, WRN-28-8) |
| Image Classification | CIFAR-10, 100 Labels (OpenSet, 6/4) | Accuracy | 70.2 | FixMatch |
| Image Classification | CIFAR-10, 50 Labels (OpenSet, 6/4) | Accuracy | 56.8 | FixMatch |
| Semi-Supervised Image Classification | CIFAR-10, 4000 Labels | Percentage error | 4.31 | FixMatch (CTA) |
| Semi-Supervised Image Classification | CIFAR-10, 400 Labels (OpenSet, 6/4) | Accuracy | 83.7 | FixMatch |
| Semi-Supervised Image Classification | cifar-100, 10000 Labels | Percentage error | 22.6 | FixMatch (RA, WRN-28-8) |
| Semi-Supervised Image Classification | CIFAR-10, 100 Labels (OpenSet, 6/4) | Accuracy | 70.2 | FixMatch |
| Semi-Supervised Image Classification | CIFAR-10, 50 Labels (OpenSet, 6/4) | Accuracy | 56.8 | FixMatch |