Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz
Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Domain Adaptation | ImageNet-A | Top-1 accuracy % | 6.6 | Mixup (ResNet-50) |
| Image Classification | CIFAR-10 | Percentage correct | 97.3 | DenseNet-BC-190 + Mixup |
| Image Classification | Kuzushiji-MNIST | Accuracy | 98.41 | PreActResNet-18 + Input Mixup |
| Image Classification | CIFAR-100 | Percentage correct | 83.2 | DenseNet-BC-190 + Mixup |
| Image Classification | SVHN, 250 Labels | Accuracy | 60.03 | MixUp |
| Image Classification | CIFAR-10, 250 Labels | Percentage error | 47.43 | MixUp |
| Semi-Supervised Image Classification | SVHN, 250 Labels | Accuracy | 60.03 | MixUp |
| Semi-Supervised Image Classification | CIFAR-10, 250 Labels | Percentage error | 47.43 | MixUp |
| Domain Generalization | ImageNet-A | Top-1 accuracy % | 6.6 | Mixup (ResNet-50) |