TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Towards Better Accuracy-efficiency Trade-offs: Divide and ...

Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training

Shuai Zhao, Liguang Zhou, Wenxiao Wang, Deng Cai, Tin Lun Lam, Yangsheng Xu

2020-11-30Image Classification
PaperPDFCode(official)Code(official)

Abstract

The width of a neural network matters since increasing the width will necessarily increase the model capacity. However, the performance of a network does not improve linearly with the width and soon gets saturated. In this case, we argue that increasing the number of networks (ensemble) can achieve better accuracy-efficiency trade-offs than purely increasing the width. To prove it, one large network is divided into several small ones regarding its parameters and regularization components. Each of these small networks has a fraction of the original one's parameters. We then train these small networks together and make them see various views of the same data to increase their diversity. During this co-training process, networks can also learn from each other. As a result, small networks can achieve better ensemble performance than the large one with few or no extra parameters or FLOPs, \ie, achieving better accuracy-efficiency trade-offs. Small networks can also achieve faster inference speed than the large one by concurrent running. All of the above shows that the number of networks is a new dimension of model scaling. We validate our argument with 8 different neural architectures on common benchmarks through extensive experiments. The code is available at \url{https://github.com/FreeformRobotics/Divide-and-Co-training}.

Results

TaskDatasetMetricValueModel
Image ClassificationCIFAR-10Percentage correct98.71PyramidNet-272, S=4
Image ClassificationCIFAR-10Percentage correct98.38WRN-40-10, S=4
Image ClassificationCIFAR-10Percentage correct98.32WRN-28-10, S=4
Image ClassificationCIFAR-10Percentage correct98.31Shake-Shake 26 2x96d, S=4
Image ClassificationCIFAR-100Percentage correct89.46PyramidNet-272, S=4
Image ClassificationCIFAR-100Percentage correct87.44DenseNet-BC-190, S=4
Image ClassificationCIFAR-100Percentage correct86.9WRN-40-10, S=4
Image ClassificationCIFAR-100Percentage correct85.74WRN-28-10, S=4
Image ClassificationImageNetGFLOPs38.2SE-ResNeXt-101, 64x4d, S=2(320px)
Image ClassificationImageNetGFLOPs61.1SE-ResNeXt-101, 64x4d, S=2(416px)
Image ClassificationImageNetGFLOPs18.8ResNeXt-101, 64x4d, S=2(224px)

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15Transferring Styles for Reduced Texture Bias and Improved Robustness in Semantic Segmentation Networks2025-07-14FedGSCA: Medical Federated Learning with Global Sample Selector and Client Adaptive Adjuster under Label Noise2025-07-13