Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
Self-supervised learning algorithms based on instance discrimination train encoders to be invariant to pre-defined transformations of the same instance. While most methods treat different views of the same image as positives for a contrastive loss, we are interested in using positives from other instances in the dataset. Our method, Nearest-Neighbor Contrastive Learning of visual Representations (NNCLR), samples the nearest neighbors from the dataset in the latent space, and treats them as positives. This provides more semantic variations than pre-defined transformations. We find that using the nearest-neighbor as positive in contrastive losses improves performance significantly on ImageNet classification, from 71.7% to 75.6%, outperforming previous state-of-the-art methods. On semi-supervised learning benchmarks we improve performance significantly when only 1% ImageNet labels are available, from 53.8% to 56.5%. On transfer learning benchmarks our method outperforms state-of-the-art methods (including supervised learning with ImageNet) on 8 out of 12 downstream datasets. Furthermore, we demonstrate empirically that our method is less reliant on complex data augmentations. We see a relative reduction of only 2.1% ImageNet Top-1 accuracy when we train using only random crops.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Classification | Stanford Cars | Accuracy | 67.1 | NNCLR |
| Image Classification | DTD | Accuracy | 75.5 | NNCLR |
| Image Classification | CIFAR-10 | Percentage correct | 93.7 | NNCLR |
| Image Classification | Oxford-IIIT Pet Dataset | Accuracy | 91.8 | NNCLR |
| Image Classification | Flowers-102 | Accuracy | 95.1 | NNCLR |
| Image Classification | PASCAL VOC 2007 | Accuracy | 83 | NNCLR |
| Image Classification | CIFAR-100 | Percentage correct | 79 | NNCLR |
| Image Classification | Food-101 | Accuracy (%) | 76.7 | NNCLR |
| Image Classification | FGVC Aircraft | Accuracy | 64.1 | NNCLR |
| Image Classification | SUN397 | Accuracy | 62.5 | NNCLR |
| Image Classification | ImageNet - 10% labeled data | Top 5 Accuracy | 89.3 | NNCLR (ResNet-50) |
| Image Classification | ImageNet - 1% labeled data | Top 5 Accuracy | 80.7 | NNCLR (ResNet-50) |
| Image Classification | ImageNet | Top 5 Accuracy | 92.4 | NNCLR (ResNet-50, multi-crop) |
| Fine-Grained Image Classification | FGVC Aircraft | Accuracy | 64.1 | NNCLR |
| Fine-Grained Image Classification | SUN397 | Accuracy | 62.5 | NNCLR |
| Semi-Supervised Image Classification | ImageNet - 10% labeled data | Top 5 Accuracy | 89.3 | NNCLR (ResNet-50) |
| Semi-Supervised Image Classification | ImageNet - 1% labeled data | Top 5 Accuracy | 80.7 | NNCLR (ResNet-50) |