Jingyao Li, Pengguang Chen, Shaozuo Yu, Zexin He, Shu Liu, Jiaya Jia
The core of out-of-distribution (OOD) detection is to learn the in-distribution (ID) representation, which is distinguishable from OOD samples. Previous work applied recognition-based methods to learn the ID features, which tend to learn shortcuts instead of comprehensive representations. In this work, we find surprisingly that simply using reconstruction-based methods could boost the performance of OOD detection significantly. We deeply explore the main contributors of OOD detection and find that reconstruction-based pretext tasks have the potential to provide a generally applicable and efficacious prior, which benefits the model in learning intrinsic data distributions of the ID dataset. Specifically, we take Masked Image Modeling as a pretext task for our OOD detection framework (MOOD). Without bells and whistles, MOOD outperforms previous SOTA of one-class OOD detection by 5.7%, multi-class OOD detection by 3.0%, and near-distribution OOD detection by 2.1%. It even defeats the 10-shot-per-class outlier exposure OOD detection, although we do not include any OOD samples for our detection
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Out-of-Distribution Detection | ImageNet-1k vs iNaturalist | AUROC | 86.9 | MOOD |
| Out-of-Distribution Detection | ImageNet-1k vs Textures | AUROC | 91.3 | MOOD |
| Out-of-Distribution Detection | ImageNet-1k vs Places | AUROC | 88.5 | MOOD |
| Out-of-Distribution Detection | ImageNet-1k vs SUN | AUROC | 89.8 | MOOD |
| Out-of-Distribution Detection | ImageNet-1k vs Curated OODs (avg.) | AUROC | 89.1 | MOOD |