Stanislav Fort, Jie Ren, Balaji Lakshminarayanan
Near out-of-distribution detection (OOD) is a major challenge for deep neural networks. We demonstrate that large-scale pre-trained transformers can significantly improve the state-of-the-art (SOTA) on a range of near OOD tasks across different data modalities. For instance, on CIFAR-100 vs CIFAR-10 OOD detection, we improve the AUROC from 85% (current SOTA) to more than 96% using Vision Transformers pre-trained on ImageNet-21k. On a challenging genomics OOD detection benchmark, we improve the AUROC from 66% to 77% using transformers and unsupervised pre-training. To further improve performance, we explore the few-shot outlier exposure setting where a few examples from outlier classes may be available; we show that pre-trained transformers are particularly well-suited for outlier exposure, and that the AUROC of OOD detection on CIFAR-100 vs CIFAR-10 can be improved to 98.7% with just 1 image per OOD class, and 99.46% with 10 images per OOD class. For multi-modal image-text pre-trained transformers such as CLIP, we explore a new way of using just the names of outlier classes as a sole source of information without any accompanying images, and show that this outperforms previous SOTA on standard vision OOD benchmark tasks.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Out-of-Distribution Detection | CIFAR-10 vs CIFAR-100 | AUPR | 97.75 | R+ViT finetuned on CIFAR-10 |
| Out-of-Distribution Detection | CIFAR-10 vs CIFAR-100 | AUROC | 98.52 | R+ViT finetuned on CIFAR-10 |
| Out-of-Distribution Detection | CIFAR-10 vs CIFAR-100 | AUPR | 97.68 | ViT finetuned on CIFAR-10 |
| Out-of-Distribution Detection | CIFAR-10 vs CIFAR-100 | AUROC | 98.42 | ViT finetuned on CIFAR-10 |
| Out-of-Distribution Detection | CIFAR-10 vs CIFAR-100 | AUPR | 96.28 | MLP-Mixer finetuned on CIFAR-10 |
| Out-of-Distribution Detection | CIFAR-10 vs CIFAR-100 | AUROC | 97.85 | MLP-Mixer finetuned on CIFAR-10 |
| Out-of-Distribution Detection | CIFAR-100 vs CIFAR-10 | AUROC | 98.11 | Ensemble of ViTs |
| Out-of-Distribution Detection | CIFAR-100 vs CIFAR-10 | AUROC | 97.98 | ViT-L_16 finetuned on CIFAR-100 |
| Out-of-Distribution Detection | CIFAR-100 vs CIFAR-10 | AUPR | 92.08 | R50+ViT_B-16 finetuned on CIFAR-100 |
| Out-of-Distribution Detection | CIFAR-100 vs CIFAR-10 | AUROC | 96.23 | R50+ViT_B-16 finetuned on CIFAR-100 |
| Out-of-Distribution Detection | CIFAR-100 vs CIFAR-10 | AUPR | 91.89 | ViT_B-16 finetuned on CIFAR-100 |
| Out-of-Distribution Detection | CIFAR-100 vs CIFAR-10 | AUROC | 95.53 | ViT_B-16 finetuned on CIFAR-100 |
| Out-of-Distribution Detection | CIFAR-100 vs CIFAR-10 | AUPR | 90.22 | MLP-Mixer_B-16 finetuned on CIFAR-100 |
| Out-of-Distribution Detection | CIFAR-100 vs CIFAR-10 | AUROC | 95.31 | MLP-Mixer_B-16 finetuned on CIFAR-100 |
| Out-of-Distribution Detection | CIFAR-100 vs CIFAR-10 | AUROC | 94.68 | CLIP using class name words describing the two distributions |