TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/HyperSeg: Towards Universal Visual Segmentation with Large...

HyperSeg: Towards Universal Visual Segmentation with Large Language Model

Cong Wei, Yujie Zhong, Haoxian Tan, Yong liu, Zheng Zhao, Jie Hu, Yujiu Yang

2024-11-26Panoptic SegmentationOpen Vocabulary Semantic SegmentationReferring Video Object SegmentationReferring Expression SegmentationSegmentationSemantic SegmentationVideo Object SegmentationWorld KnowledgeLarge Language Model
PaperPDFCode(official)

Abstract

This paper aims to address universal segmentation for image and video perception with the strong reasoning ability empowered by Visual Large Language Models (VLLMs). Despite significant progress in current unified segmentation methods, limitations in adaptation to both image and video scenarios, as well as the complex reasoning segmentation, make it difficult for them to handle various challenging instructions and achieve an accurate understanding of fine-grained vision-language correlations. We propose HyperSeg, the first VLLM-based universal segmentation model for pixel-level image and video perception, encompassing generic segmentation tasks and more complex reasoning perception tasks requiring powerful reasoning abilities and world knowledge. Besides, to fully leverage the recognition capabilities of VLLMs and the fine-grained visual information, HyperSeg incorporates hybrid entity recognition and fine-grained visual perceiver modules for various segmentation tasks. Combined with the temporal adapter, HyperSeg achieves a comprehensive understanding of temporal information. Experimental results validate the effectiveness of our insights in resolving universal image and video segmentation tasks, including the more complex reasoning perception tasks. Our code is available.

Results

TaskDatasetMetricValueModel
VideoRefer-YouTube-VOSJ&F68.5HyperSeg
Semantic SegmentationCOCO (Common Objects in Context)mIoU77.2HyperSeg
Semantic SegmentationCOCO minivalPQ61.2HyperSeg (Swin-B)
Instance SegmentationRefCOCO testAOverall IoU85.7HyperSeg
Instance SegmentationRefCoCo valOverall IoU84.8HyperSeg
Instance SegmentationRefCOCO testBOverall IoU83.4HyperSeg
Instance SegmentationRefCOCOg-testOverall IoU78.9HyperSeg
Instance SegmentationRefCOCO+ valOverall IoU79HyperSeg
Instance SegmentationRefCOCO+ test BOverall IoU75.2HyperSeg
Instance SegmentationDAVIS 2017 (val)J&F 1st frame71.2HyperSeg
Instance SegmentationRefCOCO+ testAOverall IoU83.5HyperSeg
Instance SegmentationRefCOCOg-valOverall IoU79.4HyperSeg
Video Object SegmentationRefer-YouTube-VOSJ&F68.5HyperSeg
Referring Expression SegmentationRefCOCO testAOverall IoU85.7HyperSeg
Referring Expression SegmentationRefCoCo valOverall IoU84.8HyperSeg
Referring Expression SegmentationRefCOCO testBOverall IoU83.4HyperSeg
Referring Expression SegmentationRefCOCOg-testOverall IoU78.9HyperSeg
Referring Expression SegmentationRefCOCO+ valOverall IoU79HyperSeg
Referring Expression SegmentationRefCOCO+ test BOverall IoU75.2HyperSeg
Referring Expression SegmentationDAVIS 2017 (val)J&F 1st frame71.2HyperSeg
Referring Expression SegmentationRefCOCO+ testAOverall IoU83.5HyperSeg
Referring Expression SegmentationRefCOCOg-valOverall IoU79.4HyperSeg
Open Vocabulary Semantic SegmentationPascalVOC-20mIoU92.1HyperSeg
Open Vocabulary Semantic SegmentationPASCAL Context-59mIoU64.6HyperSeg
10-shot image generationCOCO (Common Objects in Context)mIoU77.2HyperSeg
10-shot image generationCOCO minivalPQ61.2HyperSeg (Swin-B)
Panoptic SegmentationCOCO minivalPQ61.2HyperSeg (Swin-B)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits2025-07-18Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17