ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring

David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, Colin Raffel

2019-11-21Image Classification Semi-Supervised Image Classification

Paper PDF Code(official)Code(official)Code

Abstract

We improve the recently-proposed "MixMatch" semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring. Distribution alignment encourages the marginal distribution of predictions on unlabeled data to be close to the marginal distribution of ground-truth labels. Augmentation anchoring feeds multiple strongly augmented versions of an input into the model and encourages each output to be close to the prediction for a weakly-augmented version of the same input. To produce strong augmentations, we propose a variant of AutoAugment which learns the augmentation policy while the model is being trained. Our new algorithm, dubbed ReMixMatch, is significantly more data-efficient than prior work, requiring between $5\times$ and $16\times$ less data to reach the same accuracy. For example, on CIFAR-10 with 250 labeled examples we reach $93.73\%$ accuracy (compared to MixMatch's accuracy of $93.58\%$ with $4{,}000$ examples) and a median accuracy of $84.92\%$ with just four labels per class. We make our code and data open-source at https://github.com/google-research/remixmatch.

Results

Task	Dataset	Metric	Value	Model
Image Classification	STL-10	Percentage correct	93.82	ReMixMatch (K=4)
Image Classification	STL-10	Percentage correct	93.23	ReMixMatch (K=1)
Image Classification	STL-10	Percentage correct	77.8	CC-GAN
Image Classification	CIFAR-10, 4000 Labels	Percentage error	5.14	ReMixMatch
Image Classification	STL-10, 1000 Labels	Accuracy	93.82	ReMixMatch
Image Classification	SVHN, 1000 labels	Accuracy	97.17	ReMixMatch
Image Classification	cifar10, 250 Labels	Percentage correct	93.73	ReMixMatch
Image Classification	CIFAR-10, 40 Labels	Percentage error	19.1	ReMixMatch
Image Classification	CIFAR-10, 250 Labels	Percentage error	6.27	ReMixMatch
Semi-Supervised Image Classification	CIFAR-10, 4000 Labels	Percentage error	5.14	ReMixMatch
Semi-Supervised Image Classification	STL-10, 1000 Labels	Accuracy	93.82	ReMixMatch
Semi-Supervised Image Classification	SVHN, 1000 labels	Accuracy	97.17	ReMixMatch
Semi-Supervised Image Classification	cifar10, 250 Labels	Percentage correct	93.73	ReMixMatch
Semi-Supervised Image Classification	CIFAR-10, 40 Labels	Percentage error	19.1	ReMixMatch
Semi-Supervised Image Classification	CIFAR-10, 250 Labels	Percentage error	6.27	ReMixMatch

ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring

Abstract

Results

Related Papers

ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring

Abstract

Results

Related Papers