mixup: Beyond Empirical Risk Minimization

Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz

2017-10-25ICLR 2018 1Image Classification Domain Generalization Memorization Out-of-Distribution Generalization Semi-Supervised Image Classification

Abstract

Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.

Results

Task	Dataset	Metric	Value	Model
Domain Adaptation	ImageNet-A	Top-1 accuracy %	6.6	Mixup (ResNet-50)
Image Classification	CIFAR-10	Percentage correct	97.3	DenseNet-BC-190 + Mixup
Image Classification	Kuzushiji-MNIST	Accuracy	98.41	PreActResNet-18 + Input Mixup
Image Classification	CIFAR-100	Percentage correct	83.2	DenseNet-BC-190 + Mixup
Image Classification	SVHN, 250 Labels	Accuracy	60.03	MixUp
Image Classification	CIFAR-10, 250 Labels	Percentage error	47.43	MixUp
Semi-Supervised Image Classification	SVHN, 250 Labels	Accuracy	60.03	MixUp
Semi-Supervised Image Classification	CIFAR-10, 250 Labels	Percentage error	47.43	MixUp
Domain Generalization	ImageNet-A	Top-1 accuracy %	6.6	Mixup (ResNet-50)

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18 Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17 Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17 Federated Learning for Commercial Image Sources2025-07-17 MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17 Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17 GLAD: Generalizable Tuning for Vision-Language Models2025-07-17 MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17