PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

Zheng Zhang, Yeyao Ma, Enming Zhang, Xiang Bai

2024-03-21Referring Expression Panoptic Segmentation Generalized Referring Expression Segmentation Referring Expression Segmentation Segmentation Semantic Segmentation Video Object Segmentation Video Semantic Segmentation Image Segmentation

Paper PDF Code(official)

Abstract

PSALM is a powerful extension of the Large Multi-modal Model (LMM) to address the segmentation task challenges. To overcome the limitation of the LMM being limited to textual output, PSALM incorporates a mask decoder and a well-designed input schema to handle a variety of segmentation tasks. This schema includes images, task instructions, conditional prompts, and mask tokens, which enable the model to generate and classify segmentation masks effectively. The flexible design of PSALM supports joint training across multiple datasets and tasks, leading to improved performance and task generalization. PSALM achieves superior results on several benchmarks, such as RefCOCO/RefCOCO+/RefCOCOg, COCO Panoptic Segmentation, and COCO-Interactive, and further exhibits zero-shot capabilities on unseen tasks, such as open-vocabulary segmentation, generalized referring expression segmentation and video object segmentation, making a significant step towards a GPT moment in computer vision. Through extensive experiments, PSALM demonstrates its potential to transform the domain of image segmentation, leveraging the robust visual understanding capabilities of LMMs as seen in natural language processing. Code and models are available at https://github.com/zamling/PSALM.

Results

Task	Dataset	Metric	Value	Model
Instance Segmentation	RefCoCo val	Overall IoU	83.6	PSALM
Referring Expression Segmentation	RefCoCo val	Overall IoU	83.6	PSALM

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17 DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17 From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17 Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17 SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17 Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17 A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17