TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/UGNCL: Uncertainty-Guided Noisy Correspondence Learning fo...

UGNCL: Uncertainty-Guided Noisy Correspondence Learning for Efficient Cross-Modal Matching

Quanxing Zha, Xin Liu, Yiu-ming Cheung, Xing Xu, Nannan Wang, Jianjia Cao

2024-07-11SIGIR 2024 7Cross-Modal RetrievalCross-modal retrieval with noisy correspondenceImage-text RetrievalImage-text matching
PaperPDFCode(official)

Abstract

Cross-modal matching has recently gained significant popularity to facilitate retrieval across multi-modal data, and existing works are highly relied on an implicit assumption that the training data pairs are perfectly aligned. However, such an ideal assumption is extremely impossible due to the inevitably mismatched data pairs, a.k.a. noisy correspondence, which can wrongly enforce the mismatched data to be similar and thus induces the performance degradation. Although some recent methods have attempted to address this problem, they still face two challenging issues: 1) un- reliable data division for training inefficiency and 2) unstable pre- diction for matching failure. To address these problems, we pro- pose an efficient Uncertainty-Guided Noisy Correspondence Learning (UGNCL) framework to achieve noise-robust cross-modal matching. Specifically, a novel Uncertainty Guided Division (UGD) algorithm is reliably designed leverage the potential benefits of derived un- certainty to divide the data into clean, noisy and hard partitions, which can effortlessly mitigate the impact of easily-determined noisy pairs. Meanwhile, an efficient Trusted Robust Loss (TRL) is explicitly designed to recast the soft margins, calibrated by confi- dent yet error soft correspondence labels, for the data pairs in the hard partition through the uncertainty, leading to increase/decrease the importance of matched/mismatched pairs and further alleviate the impact of noisy pairs for robustness improvement. Extensive experiments conducted on three public datasets highlight the su- periorities of the proposed framework, and show its competitive performance compared with the state-of-the-arts. The code is avail- able at https://github.com/qxzha/UGNCL.

Related Papers

An analysis of vision-language models for fabric retrieval2025-07-07Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval2025-06-28Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval2025-06-26Multimodal Medical Image Binding via Shared Text Embeddings2025-06-22ContextRefine-CLIP for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 20252025-06-12FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models2025-06-12Adding simple structure at inference improves Vision-Language Compositionality2025-06-11FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation2025-06-10