TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Phrase Grounding by Soft-Label Chain Conditional Random Fi...

Phrase Grounding by Soft-Label Chain Conditional Random Field

Jiacheng Liu, Julia Hockenmaier

2019-09-01IJCNLP 2019 11Structured PredictionPhrase Grounding
PaperPDFCode(official)

Abstract

The phrase grounding task aims to ground each entity mention in a given caption of an image to a corresponding region in that image. Although there are clear dependencies between how different mentions of the same caption should be grounded, previous structured prediction methods that aim to capture such dependencies need to resort to approximate inference or non-differentiable losses. In this paper, we formulate phrase grounding as a sequence labeling task where we treat candidate regions as potential labels, and use neural chain Conditional Random Fields (CRFs) to model dependencies among regions for adjacent mentions. In contrast to standard sequence labeling tasks, the phrase grounding task is defined such that there may be multiple correct candidate regions. To address this multiplicity of gold labels, we define so-called Soft-Label Chain CRFs, and present an algorithm that enables convenient end-to-end training. Our method establishes a new state-of-the-art on phrase grounding on the Flickr30k Entities dataset. Analysis shows that our model benefits both from the entity dependencies captured by the CRF and from the soft-label training regime. Our code is available at \url{github.com/liujch1998/SoftLabelCCRF}

Results

TaskDatasetMetricValueModel
Phrase GroundingFlickr30k Entities TestR@174.69Soft-Label Chain CRF (SL-CCRF)

Related Papers

Anatomy-Grounded Weakly Supervised Prompt Tuning for Chest X-ray Latent Diffusion Models2025-06-12Learning Distributions over Permutations and Rankings with Factorized Representations2025-05-30Nested Named Entity Recognition as Single-Pass Sequence Labeling2025-05-22Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures2025-05-16Multi-domain Multilingual Sentiment Analysis in Industry: Predicting Aspect-based Opinion Quadruples2025-05-15Structured Prediction with Abstention via the Lovász Hinge2025-05-09A Comparison of Object Detection and Phrase Grounding Models in Chest X-ray Abnormality Localization using Eye-tracking Data2025-03-02Predicting Through Generation: Why Generation Is Better for Prediction2025-02-25