TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Prompt Pre-Training with Twenty-Thousand Classes for Open-...

Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

Shuhuai Ren, Aston Zhang, Yi Zhu, Shuai Zhang, Shuai Zheng, Mu Li, Alex Smola, Xu sun

2023-04-10NeurIPS 2023 11Image ClassificationSegmentationSemantic Segmentationobject-detectionObject Detection
PaperPDFCode(official)

Abstract

This work proposes POMP, a prompt pre-training method for vision-language models. Being memory and computation efficient, POMP enables the learned prompt to condense semantic information for a rich set of visual concepts with over twenty-thousand classes. Once pre-trained, the prompt with a strong transferable ability can be directly plugged into a variety of visual recognition tasks including image classification, semantic segmentation, and object detection, to boost recognition performances in a zero-shot manner. Empirical evaluation shows that POMP achieves state-of-the-art performances on 21 datasets, e.g., 67.0% average accuracy on 10 classification datasets (+3.1% compared to CoOp) and 84.4 hIoU on open-vocabulary Pascal VOC segmentation (+6.9 compared to ZSSeg). Our code is available at https://github.com/amazon-science/prompt-pretraining.

Results

TaskDatasetMetricValueModel
Object DetectionLVIS v1.0AP novel-LVIS base training25.2POMP
3DLVIS v1.0AP novel-LVIS base training25.2POMP
2D ClassificationLVIS v1.0AP novel-LVIS base training25.2POMP
2D Object DetectionLVIS v1.0AP novel-LVIS base training25.2POMP
Prompt EngineeringImageNet-RTop-1 accuracy %77.9POMP
Prompt EngineeringImageNet-21kAccuracy25.3POMP
Prompt EngineeringImageNet-STop-1 accuracy %49.8POMP
Prompt EngineeringImageNet-ATop-1 accuracy %51.6POMP
Open Vocabulary Object DetectionLVIS v1.0AP novel-LVIS base training25.2POMP
Open Vocabulary Semantic SegmentationCOCO-Stuff-171HIoU39.1POMP
Open Vocabulary Semantic SegmentationPascalVOC-20hIoU84.4POMP
Open Vocabulary Semantic SegmentationPascalVOC-20mIoU89.4POMP
16kLVIS v1.0AP novel-LVIS base training25.2POMP

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17