Lorenzo Brigato, Björn Barz, Luca Iocchi, Joachim Denzler
Data-efficient image classification using deep neural networks in settings, where only small amounts of labeled data are available, has been an active research area in the recent past. However, an objective comparison between published methods is difficult, since existing works use different datasets for evaluation and often compare against untuned baselines with default hyper-parameters. We design a benchmark for data-efficient image classification consisting of six diverse datasets spanning various domains (e.g., natural images, medical imagery, satellite data) and data types (RGB, grayscale, multispectral). Using this benchmark, we re-evaluate the standard cross-entropy baseline and eight methods for data-efficient deep learning published between 2017 and 2021 at renowned venues. For a fair and realistic comparison, we carefully tune the hyper-parameters of all methods on each dataset. Surprisingly, we find that tuning learning rate, weight decay, and batch size on a separate validation split results in a highly competitive baseline, which outperforms all but one specialized method and performs competitively to the remaining one.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Classification | DEIC Benchmark | Average Balanced Accuracy (across datasets) | 68.7 | Harmonic Networks |
| Image Classification | DEIC Benchmark | Average Balanced Accuracy (across datasets) | 67.9 | Cross-Entropy baseline |
| Image Classification | DEIC Benchmark | Average Balanced Accuracy (across datasets) | 64.92 | Cosine + Cross-Entropy Loss |
| Image Classification | DEIC Benchmark | Average Balanced Accuracy (across datasets) | 64.67 | T-vMF Similarity |
| Image Classification | DEIC Benchmark | Average Balanced Accuracy (across datasets) | 64.64 | DSK Networks |
| Image Classification | DEIC Benchmark | Average Balanced Accuracy (across datasets) | 64.15 | OLÉ |
| Image Classification | DEIC Benchmark | Average Balanced Accuracy (across datasets) | 62.73 | Cosine Loss |
| Image Classification | DEIC Benchmark | Average Balanced Accuracy (across datasets) | 62.06 | Full Convolution |
| Image Classification | DEIC Benchmark | Average Balanced Accuracy (across datasets) | 60.33 | Deep Hybrid Networks |
| Image Classification | DEIC Benchmark | Average Balanced Accuracy (across datasets) | 55.47 | Grad-l2 Penalty |
| Image Classification | ImageNet 50 samples per class | 1:1 Accuracy | 46.36 | Harmonic Networks |
| Image Classification | ImageNet 50 samples per class | 1:1 Accuracy | 45.21 | DSK Networks |
| Image Classification | ImageNet 50 samples per class | 1:1 Accuracy | 44.97 | Cross-entropy baseline |
| Image Classification | CUB-200-2011, 30 samples per class | Accuracy | 72.26 | Harmonic Networks (no pre-training) |
| Image Classification | CUB-200-2011, 30 samples per class | Accuracy | 71.44 | Cross-entropy baseline (no pre-training) |
| Image Classification | CUB-200-2011, 30 samples per class | Accuracy | 71.02 | DSK Networks (no pre-training) |
| Image Classification | EuroSAT 50 samples per class | Accuracy | 92.09 | Harmonic Networks |
| Image Classification | EuroSAT 50 samples per class | Accuracy | 91.25 | DSK Networks |
| Image Classification | EuroSAT 50 samples per class | Accuracy | 91.15 | Deep Hybrid Networks |
| Image Classification | ciFAIR-10 50 samples per class | Accuracy | 58.22 | Cross-entropy baseline |
| Image Classification | ciFAIR-10 50 samples per class | Accuracy | 57.5 | T-vMF Similarity |
| Image Classification | ciFAIR-10 50 samples per class | Accuracy | 56.5 | Harmonic Networks |