Matthijs Douze, Arthur Szlam, Bharath Hariharan, Hervé Jégou
This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time. This setup is often referred to as low-shot learning, where a standard approach is to re-train the last few layers of a convolutional neural network learned on separate classes for which training examples are abundant. We consider a semi-supervised setting based on a large collection of images to support label propagation. This is possible by leveraging the recent advances on large-scale similarity graph construction. We show that despite its conceptual simplicity, scaling label propagation up to hundred millions of images leads to state of the art accuracy in the low-shot learning regime.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Classification | ImageNet-FS (5-shot, all) | Top-5 Accuracy (%) | 73.8 | LSD (ResNet-50) |
| Image Classification | ImageNet-FS (1-shot, novel) | Top-5 Accuracy (%) | 57.7 | LSD (ResNet-50) |
| Image Classification | ImageNet-FS (2-shot, novel) | Top-5 Accuracy (%) | 66.9 | LSD (ResNet-50) |
| Few-Shot Image Classification | ImageNet-FS (5-shot, all) | Top-5 Accuracy (%) | 73.8 | LSD (ResNet-50) |
| Few-Shot Image Classification | ImageNet-FS (1-shot, novel) | Top-5 Accuracy (%) | 57.7 | LSD (ResNet-50) |
| Few-Shot Image Classification | ImageNet-FS (2-shot, novel) | Top-5 Accuracy (%) | 66.9 | LSD (ResNet-50) |