Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

Shuhuai Ren, Aston Zhang, Yi Zhu, Shuai Zhang, Shuai Zheng, Mu Li, Alex Smola, Xu sun

2023-04-10NeurIPS 2023 11Image Classification Segmentation Semantic Segmentation object-detection Object Detection

Abstract

This work proposes POMP, a prompt pre-training method for vision-language models. Being memory and computation efficient, POMP enables the learned prompt to condense semantic information for a rich set of visual concepts with over twenty-thousand classes. Once pre-trained, the prompt with a strong transferable ability can be directly plugged into a variety of visual recognition tasks including image classification, semantic segmentation, and object detection, to boost recognition performances in a zero-shot manner. Empirical evaluation shows that POMP achieves state-of-the-art performances on 21 datasets, e.g., 67.0% average accuracy on 10 classification datasets (+3.1% compared to CoOp) and 84.4 hIoU on open-vocabulary Pascal VOC segmentation (+6.9 compared to ZSSeg). Our code is available at https://github.com/amazon-science/prompt-pretraining.

Results

Task	Dataset	Metric	Value	Model
Object Detection	LVIS v1.0	AP novel-LVIS base training	25.2	POMP
3D	LVIS v1.0	AP novel-LVIS base training	25.2	POMP
2D Classification	LVIS v1.0	AP novel-LVIS base training	25.2	POMP
2D Object Detection	LVIS v1.0	AP novel-LVIS base training	25.2	POMP
Prompt Engineering	ImageNet-R	Top-1 accuracy %	77.9	POMP
Prompt Engineering	ImageNet-21k	Accuracy	25.3	POMP
Prompt Engineering	ImageNet-S	Top-1 accuracy %	49.8	POMP
Prompt Engineering	ImageNet-A	Top-1 accuracy %	51.6	POMP
Open Vocabulary Object Detection	LVIS v1.0	AP novel-LVIS base training	25.2	POMP
Open Vocabulary Semantic Segmentation	COCO-Stuff-171	HIoU	39.1	POMP
Open Vocabulary Semantic Segmentation	PascalVOC-20	hIoU	84.4	POMP
Open Vocabulary Semantic Segmentation	PascalVOC-20	mIoU	89.4	POMP
16k	LVIS v1.0	AP novel-LVIS base training	25.2	POMP

Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

Abstract

Results

Related Papers

Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

Abstract

Results

Related Papers