TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Generative Prompt Model for Weakly Supervised Object Local...

Generative Prompt Model for Weakly Supervised Object Localization

Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan

2023-07-19ICCV 2023 1DenoisingImage DenoisingObject LocalizationWeakly-Supervised Object LocalizationLanguage Modelling
PaperPDFCode(official)

Abstract

Weakly supervised object localization (WSOL) remains challenging when learning object localization models from image category labels. Conventional methods that discriminatively train activation models ignore representative yet less discriminative object parts. In this study, we propose a generative prompt model (GenPromp), defining the first generative pipeline to localize less discriminative object parts by formulating WSOL as a conditional image denoising procedure. During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings. During inference, enPromp combines the representative embeddings with discriminative embeddings (queried from an off-the-shelf vision-language model) for both representative and discriminative capacity. The combined embeddings are finally used to generate multi-scale high-quality attention maps, which facilitate localizing full object extent. Experiments on CUB-200-2011 and ILSVRC show that GenPromp respectively outperforms the best discriminative models by 5.2% and 5.6% (Top-1 Loc), setting a solid baseline for WSOL with the generative model. Code is available at https://github.com/callsys/GenPromp.

Results

TaskDatasetMetricValueModel
Object LocalizationImageNetGT-known localization accuracy75Stable diffusion
Object LocalizationImageNetTop-1 Localization Accuracy65.2Stable diffusion
Object LocalizationCUB-200-2011GT-known localization accuracy98Stable diffusion
Object LocalizationCUB-200-2011Top-1 Localization Accuracy87Stable diffusion
Object Localization CUB-200-2011Top-1 Localization Accuracy87GenPromp

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16