Simon Damm, Mike Laszkiewicz, Johannes Lederer, Asja Fischer
Recent advances in multimodal foundation models have set new standards in few-shot anomaly detection. This paper explores whether high-quality visual features alone are sufficient to rival existing state-of-the-art vision-language models. We affirm this by adapting DINOv2 for one-shot and few-shot anomaly detection, with a focus on industrial applications. We show that this approach does not only rival existing techniques but can even outmatch them in many settings. Our proposed vision-only approach, AnomalyDINO, is based on patch similarities and enables both image-level anomaly prediction and pixel-level anomaly segmentation. The approach is methodologically simple and training-free and, thus, does not require any additional data for fine-tuning or meta-learning. Despite its simplicity, AnomalyDINO achieves state-of-the-art results in one- and few-shot anomaly detection (e.g., pushing the one-shot performance on MVTec-AD from an AUROC of 93.1% to 96.6%). The reduced overhead, coupled with its outstanding few-shot performance, makes AnomalyDINO a strong candidate for fast deployment, e.g., in industrial contexts.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Anomaly Detection | MVTec AD | Detection AUROC | 99.5 | AnomalyDINO-S (full-shot) |
| Anomaly Detection | MVTec AD | Segmentation AUPRO | 95 | AnomalyDINO-S (full-shot) |
| Anomaly Detection | MVTec AD | Segmentation AUROC | 98.2 | AnomalyDINO-S (full-shot) |
| Anomaly Detection | MVTec AD | Detection AUROC | 97.7 | AnomalyDINO-S (4-shot) |
| Anomaly Detection | MVTec AD | Segmentation AUPRO | 93.4 | AnomalyDINO-S (4-shot) |
| Anomaly Detection | MVTec AD | Segmentation AUROC | 97.2 | AnomalyDINO-S (4-shot) |
| Anomaly Detection | MVTec AD | Detection AUROC | 96.9 | AnomalyDINO-S (2-shot) |
| Anomaly Detection | MVTec AD | Segmentation AUPRO | 93.1 | AnomalyDINO-S (2-shot) |
| Anomaly Detection | MVTec AD | Segmentation AUROC | 97 | AnomalyDINO-S (2-shot) |
| Anomaly Detection | MVTec AD | Detection AUROC | 96.6 | AnomalyDINO-S (1-shot) |
| Anomaly Detection | MVTec AD | Segmentation AUPRO | 92.7 | AnomalyDINO-S (1-shot) |
| Anomaly Detection | MVTec AD | Segmentation AUROC | 96.8 | AnomalyDINO-S (1-shot) |
| Anomaly Detection | VisA | Detection AUROC | 97.6 | AnomalyDINO-S (full-shot) |
| Anomaly Detection | VisA | Segmentation AUPRO (until 30% FPR) | 96.1 | AnomalyDINO-S (full-shot) |
| Anomaly Detection | VisA | Segmentation AUROC | 98.8 | AnomalyDINO-S (full-shot) |
| Anomaly Detection | VisA | Detection AUROC | 92.6 | AnomalyDINO-S (4-shot) |
| Anomaly Detection | VisA | Segmentation AUPRO (until 30% FPR) | 94.1 | AnomalyDINO-S (4-shot) |
| Anomaly Detection | VisA | Segmentation AUROC | 98.2 | AnomalyDINO-S (4-shot) |
| Anomaly Detection | VisA | Detection AUROC | 89.7 | AnomalyDINO-S (2-shot) |
| Anomaly Detection | VisA | Segmentation AUPRO (until 30% FPR) | 93.4 | AnomalyDINO-S (2-shot) |
| Anomaly Detection | VisA | Segmentation AUROC | 98 | AnomalyDINO-S (2-shot) |
| Anomaly Detection | VisA | Detection AUROC | 87.4 | AnomalyDINO-S (1-shot) |
| Anomaly Detection | VisA | Segmentation AUPRO (until 30% FPR) | 92.5 | AnomalyDINO-S (1-shot) |
| Anomaly Detection | VisA | Segmentation AUROC | 97.8 | AnomalyDINO-S (1-shot) |