TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CoHD: A Counting-Aware Hierarchical Decoding Framework for...

CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation

Zhuoyan Luo, Yinghao Wu, Tianheng Cheng, Yong liu, Yicheng Xiao, Hongfa Wang, Xiao-Ping Zhang, Yujiu Yang

2024-05-24Referring ExpressionSemantic correspondenceGeneralized Referring Expression SegmentationReferring Expression Segmentation
PaperPDFCode(official)Code(official)

Abstract

The newly proposed Generalized Referring Expression Segmentation (GRES) amplifies the formulation of classic RES by involving complex multiple/non-target scenarios. Recent approaches address GRES by directly extending the well-adopted RES frameworks with object-existence identification. However, these approaches tend to encode multi-granularity object information into a single representation, which makes it difficult to precisely represent comprehensive objects of different granularity. Moreover, the simple binary object-existence identification across all referent scenarios fails to specify their inherent differences, incurring ambiguity in object understanding. To tackle the above issues, we propose a \textbf{Co}unting-Aware \textbf{H}ierarchical \textbf{D}ecoding framework (CoHD) for GRES. By decoupling the intricate referring semantics into different granularity with a visual-linguistic hierarchy, and dynamic aggregating it with intra- and inter-selection, CoHD boosts multi-granularity comprehension with the reciprocal benefit of the hierarchical nature. Furthermore, we incorporate the counting ability by embodying multiple/single/non-target scenarios into count- and category-level supervision, facilitating comprehensive object perception. Experimental results on gRefCOCO, Ref-ZOM, R-RefCOCO, and RefCOCO benchmarks demonstrate the effectiveness and rationality of CoHD which outperforms state-of-the-art GRES methods by a remarkable margin. Code is available at \href{https://github.com/RobertLuo1/CoHD}{here}.

Results

TaskDatasetMetricValueModel
Instance SegmentationgRefCOCOcIoU65.42HDC
Instance SegmentationgRefCOCOgIoU68.28HDC
Referring Expression SegmentationgRefCOCOcIoU65.42HDC
Referring Expression SegmentationgRefCOCOgIoU68.28HDC

Related Papers

DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy2025-07-02Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval2025-06-28Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models2025-06-26Referring Expression Instance Retrieval and A Strong End-to-End Baseline2025-06-23RL from Physical Feedback: Aligning Large Motion Models with Humanoid Control2025-06-15Gondola: Grounded Vision Language Planning for Generalizable Robotic Manipulation2025-06-12Synthetic Visual Genome2025-06-09Jamais Vu: Exposing the Generalization Gap in Supervised Semantic Correspondence2025-06-09