Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiaolong Wang, Shalini De Mello

2023-03-08CVPR 2023 1Panoptic Segmentation Open Vocabulary Semantic Segmentation Zero Shot Segmentation Segmentation Semantic Segmentation Open Vocabulary Panoptic Segmentation Open-World Instance Segmentation

Paper PDF Code(official)

Abstract

We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. Text-to-image diffusion models have the remarkable ability to generate high-quality images with diverse open-vocabulary language descriptions. This demonstrates that their internal representation space is highly correlated with open concepts in the real world. Text-image discriminative models like CLIP, on the other hand, are good at classifying images into open-vocabulary labels. We leverage the frozen internal representations of both these models to perform panoptic segmentation of any category in the wild. Our approach outperforms the previous state of the art by significant margins on both open-vocabulary panoptic and semantic segmentation tasks. In particular, with COCO training only, our method achieves 23.4 PQ and 30.0 mIoU on the ADE20K dataset, with 8.3 PQ and 7.9 mIoU absolute improvement over the previous state of the art. We open-source our code and models at https://github.com/NVlabs/ODISE .

Results

Task	Dataset	Metric	Value	Model
Open Vocabulary Panoptic Segmentation	ADE20K	PQ	23.4	ODISE(Caption)
Open Vocabulary Panoptic Segmentation	ADE20K	PQ	22.6	ODISE (Label)
Instance Segmentation	UVO	ARmask	57.7	ODISE
Zero Shot Segmentation	Segmentation in the Wild	Mean AP	38.7	odise
Open Vocabulary Semantic Segmentation	ADE20K-847	mIoU	11.1	ODISE
Open Vocabulary Semantic Segmentation	PASCAL Context-459	mIoU	14.5	ODISE
Open Vocabulary Semantic Segmentation	PascalVOC-20	mIoU	84.6	ODISE
Open Vocabulary Semantic Segmentation	PASCAL Context-59	mIoU	57.3	ODISE
Open Vocabulary Semantic Segmentation	ADE20K-150	mIoU	29.9	ODISE

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17 DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17 From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17 Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17 SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17 Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17 A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17