TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Your Diffusion Model is Secretly a Zero-Shot Classifier

Your Diffusion Model is Secretly a Zero-Shot Classifier

Alexander C. Li, Mihir Prabhudesai, Shivam Duggal, Ellis Brown, Deepak Pathak

2023-03-28ICCV 2023 1Image ClassificationDomain GeneralizationVisual ReasoningZero-Shot Transfer Image ClassificationRelational ReasoningImage GenerationZero-Shot LearningFine-Grained Image Classification
PaperPDFCodeCode(official)CodeCode

Abstract

The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image generation abilities. These models can generate realistic images for a staggering variety of prompts and exhibit impressive compositional generalization abilities. Almost all use cases thus far have solely focused on sampling; however, diffusion models can also provide conditional density estimates, which are useful for tasks beyond image generation. In this paper, we show that the density estimates from large-scale text-to-image diffusion models like Stable Diffusion can be leveraged to perform zero-shot classification without any additional training. Our generative approach to classification, which we call Diffusion Classifier, attains strong results on a variety of benchmarks and outperforms alternative methods of extracting knowledge from diffusion models. Although a gap remains between generative and discriminative approaches on zero-shot recognition tasks, our diffusion-based approach has significantly stronger multimodal compositional reasoning ability than competing discriminative approaches. Finally, we use Diffusion Classifier to extract standard classifiers from class-conditional diffusion models trained on ImageNet. Our models achieve strong classification performance using only weak augmentations and exhibit qualitatively better "effective robustness" to distribution shift. Overall, our results are a step toward using generative over discriminative models for downstream tasks. Results and visualizations at https://diffusion-classifier.github.io/

Results

TaskDatasetMetricValueModel
Domain AdaptationImageNet-ATop-1 accuracy %30.2Diffusion Classifier
Visual ReasoningWinogroundText Score34Diffusion Classifier (zero-shot)
Image ClassificationCIFAR-10Percentage correct88.5Diffusion Classifier (zero-shot)
Image ClassificationOxford-IIIT PetsPer-Class Accuracy87.3Diffusion Classifier (zero-shot)
Image ClassificationFlowers-102Per-Class Accuracy66.3Diffusion Classifier (zero-shot)
Image ClassificationSTL-10Percentage correct95.4Diffusion Classifier (zero-shot)
Image ClassificationObjectNet (ImageNet classes)Top 1 Accuracy43.4Diffusion Classifier (zero-shot)
Image ClassificationObjectNet (ImageNet classes)Top 1 Accuracy33.9Diffusion Classifier
Image ClassificationFGVC AircraftAccuracy26.4Diffusion Classifier (zero-shot)
Fine-Grained Image ClassificationFGVC AircraftAccuracy26.4Diffusion Classifier (zero-shot)
Zero-Shot Transfer Image ClassificationImageNetAccuracy (Private)61.4Diffusion Classifier (zero-shot)
Zero-Shot Transfer Image ClassificationFood-101Top 1 Accuracy77.7Diffusion Classifier (zero-shot)
Domain GeneralizationImageNet-ATop-1 accuracy %30.2Diffusion Classifier

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17