How to Use Dropout Correctly on Residual Networks with Batch Normalization

Bum Jun Kim, Hyeyeon Choi, Hyeonah Jang, Donggeon Lee, Sang Woo Kim

2023-02-13Image Classification Classification Fine-Grained Image Classification

Abstract

For the stable optimization of deep neural networks, regularization methods such as dropout and batch normalization have been used in various tasks. Nevertheless, the correct position to apply dropout has rarely been discussed, and different positions have been employed depending on the practitioners. In this study, we investigate the correct position to apply dropout. We demonstrate that for a residual network with batch normalization, applying dropout at certain positions increases the performance, whereas applying dropout at other positions decreases the performance. Based on theoretical analysis, we provide the following guideline for the correct position to apply dropout: apply one dropout after the last batch normalization but before the last weight layer in the residual branch. We provide detailed theoretical explanations to support this claim and demonstrate them through module tests. In addition, we investigate the correct position of dropout in the head that produces the final prediction. Although the current consensus is to apply dropout after global average pooling, we prove that applying dropout before global average pooling leads to a more stable output. The proposed guidelines are validated through experiments using different datasets and models.

Results

Task	Dataset	Metric	Value	Model
Image Classification	CIFAR-10	Percentage correct	94.4367	PreResNet-110
Image Classification	CIFAR-100	Percentage correct	73.98	PreResNet-110
Image Classification	Oxford-IIIT Pets	Accuracy	85.5897	PreResNet-101
Fine-Grained Image Classification	Oxford-IIIT Pets	Accuracy	85.5897	PreResNet-101

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18 Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17 Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17 Federated Learning for Commercial Image Sources2025-07-17 MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17 Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16 Safeguarding Federated Learning-based Road Condition Classification2025-07-16 Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15