ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning

Quanxing Zha, Xin Liu, Shu-Juan Peng, Yiu-ming Cheung, Xing Xu, Nannan Wang

2025-02-27CVPR 2025 1Cross-Modal Retrieval Cross-modal retrieval with noisy correspondence Image-text Retrieval Image-text matching

Paper PDF Code(official)

Abstract

Can we accurately identify the true correspondences from multimodal datasets containing mismatched data pairs? Existing methods primarily emphasize the similarity matching between the representations of objects across modalities, potentially neglecting the crucial relation consistency within modalities that are particularly important for distinguishing the true and false correspondences. Such an omission often runs the risk of misidentifying negatives as positives, thus leading to unanticipated performance degradation. To address this problem, we propose a general Relation Consistency learning framework, namely ReCon, to accurately discriminate the true correspondences among the multimodal data and thus effectively mitigate the adverse impact caused by mismatches. Specifically, ReCon leverages a novel relation consistency learning to ensure the dual-alignment, respectively of, the cross-modal relation consistency between different modalities and the intra-modal relation consistency within modalities. Thanks to such dual constrains on relations, ReCon significantly enhances its effectiveness for true correspondence discrimination and therefore reliably filters out the mismatched pairs to mitigate the risks of wrong supervisions. Extensive experiments on three widely-used benchmark datasets, including Flickr30K, MS-COCO, and Conceptual Captions, are conducted to demonstrate the effectiveness and superiority of ReCon compared with other SOTAs. The code is available at: https://github.com/qxzha/ReCon.

Results

Task	Dataset	Metric	Value	Model
Image Retrieval with Multi-Modal Query	COCO-Noisy	Image-to-text R@1	80.9	ReCon
Image Retrieval with Multi-Modal Query	COCO-Noisy	Image-to-text R@10	98.8	ReCon
Image Retrieval with Multi-Modal Query	COCO-Noisy	Image-to-text R@5	96.6	ReCon
Image Retrieval with Multi-Modal Query	COCO-Noisy	R-Sum	528.6	ReCon
Image Retrieval with Multi-Modal Query	COCO-Noisy	Text-to-image R@1	65.2	ReCon
Image Retrieval with Multi-Modal Query	COCO-Noisy	Text-to-image R@10	96	ReCon
Image Retrieval with Multi-Modal Query	COCO-Noisy	Text-to-image R@5	91	ReCon
Image Retrieval with Multi-Modal Query	CC152K	Image-to-text R@1	43.1	ReCon
Image Retrieval with Multi-Modal Query	CC152K	Image-to-text R@10	78.1	ReCon
Image Retrieval with Multi-Modal Query	CC152K	Image-to-text R@5	68.7	ReCon
Image Retrieval with Multi-Modal Query	CC152K	R-Sum	380.5	ReCon
Image Retrieval with Multi-Modal Query	CC152K	Text-to-image R@1	44.9	ReCon
Image Retrieval with Multi-Modal Query	CC152K	Text-to-image R@10	77.4	ReCon
Image Retrieval with Multi-Modal Query	CC152K	Text-to-image R@5	68.3	ReCon
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Image-to-text R@1	80.3	ReCon
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Image-to-text R@10	97.8	ReCon
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Image-to-text R@5	95.3	ReCon
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	R-Sum	511.8	ReCon
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Text-to-image R@1	61.6	ReCon
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Text-to-image R@10	91.3	ReCon
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Text-to-image R@5	85.5	ReCon
Cross-Modal Information Retrieval	COCO-Noisy	Image-to-text R@1	80.9	ReCon
Cross-Modal Information Retrieval	COCO-Noisy	Image-to-text R@10	98.8	ReCon
Cross-Modal Information Retrieval	COCO-Noisy	Image-to-text R@5	96.6	ReCon
Cross-Modal Information Retrieval	COCO-Noisy	R-Sum	528.6	ReCon
Cross-Modal Information Retrieval	COCO-Noisy	Text-to-image R@1	65.2	ReCon
Cross-Modal Information Retrieval	COCO-Noisy	Text-to-image R@10	96	ReCon
Cross-Modal Information Retrieval	COCO-Noisy	Text-to-image R@5	91	ReCon
Cross-Modal Information Retrieval	CC152K	Image-to-text R@1	43.1	ReCon
Cross-Modal Information Retrieval	CC152K	Image-to-text R@10	78.1	ReCon
Cross-Modal Information Retrieval	CC152K	Image-to-text R@5	68.7	ReCon
Cross-Modal Information Retrieval	CC152K	R-Sum	380.5	ReCon
Cross-Modal Information Retrieval	CC152K	Text-to-image R@1	44.9	ReCon
Cross-Modal Information Retrieval	CC152K	Text-to-image R@10	77.4	ReCon
Cross-Modal Information Retrieval	CC152K	Text-to-image R@5	68.3	ReCon
Cross-Modal Information Retrieval	Flickr30K-Noisy	Image-to-text R@1	80.3	ReCon
Cross-Modal Information Retrieval	Flickr30K-Noisy	Image-to-text R@10	97.8	ReCon
Cross-Modal Information Retrieval	Flickr30K-Noisy	Image-to-text R@5	95.3	ReCon
Cross-Modal Information Retrieval	Flickr30K-Noisy	R-Sum	511.8	ReCon
Cross-Modal Information Retrieval	Flickr30K-Noisy	Text-to-image R@1	61.6	ReCon
Cross-Modal Information Retrieval	Flickr30K-Noisy	Text-to-image R@10	91.3	ReCon
Cross-Modal Information Retrieval	Flickr30K-Noisy	Text-to-image R@5	85.5	ReCon
Cross-Modal Retrieval	COCO-Noisy	Image-to-text R@1	80.9	ReCon
Cross-Modal Retrieval	COCO-Noisy	Image-to-text R@10	98.8	ReCon
Cross-Modal Retrieval	COCO-Noisy	Image-to-text R@5	96.6	ReCon
Cross-Modal Retrieval	COCO-Noisy	R-Sum	528.6	ReCon
Cross-Modal Retrieval	COCO-Noisy	Text-to-image R@1	65.2	ReCon
Cross-Modal Retrieval	COCO-Noisy	Text-to-image R@10	96	ReCon
Cross-Modal Retrieval	COCO-Noisy	Text-to-image R@5	91	ReCon
Cross-Modal Retrieval	CC152K	Image-to-text R@1	43.1	ReCon
Cross-Modal Retrieval	CC152K	Image-to-text R@10	78.1	ReCon
Cross-Modal Retrieval	CC152K	Image-to-text R@5	68.7	ReCon
Cross-Modal Retrieval	CC152K	R-Sum	380.5	ReCon
Cross-Modal Retrieval	CC152K	Text-to-image R@1	44.9	ReCon
Cross-Modal Retrieval	CC152K	Text-to-image R@10	77.4	ReCon
Cross-Modal Retrieval	CC152K	Text-to-image R@5	68.3	ReCon
Cross-Modal Retrieval	Flickr30K-Noisy	Image-to-text R@1	80.3	ReCon
Cross-Modal Retrieval	Flickr30K-Noisy	Image-to-text R@10	97.8	ReCon
Cross-Modal Retrieval	Flickr30K-Noisy	Image-to-text R@5	95.3	ReCon
Cross-Modal Retrieval	Flickr30K-Noisy	R-Sum	511.8	ReCon
Cross-Modal Retrieval	Flickr30K-Noisy	Text-to-image R@1	61.6	ReCon
Cross-Modal Retrieval	Flickr30K-Noisy	Text-to-image R@10	91.3	ReCon
Cross-Modal Retrieval	Flickr30K-Noisy	Text-to-image R@5	85.5	ReCon

Abstract

Results

Task	Dataset	Metric	Value	Model
Image Retrieval with Multi-Modal Query	COCO-Noisy	Image-to-text R@1	80.9	ReCon
Image Retrieval with Multi-Modal Query	COCO-Noisy	Image-to-text R@10	98.8	ReCon
Image Retrieval with Multi-Modal Query	COCO-Noisy	Image-to-text R@5	96.6	ReCon
Image Retrieval with Multi-Modal Query	COCO-Noisy	R-Sum	528.6	ReCon
Image Retrieval with Multi-Modal Query	COCO-Noisy	Text-to-image R@1	65.2	ReCon
Image Retrieval with Multi-Modal Query	COCO-Noisy	Text-to-image R@10	96	ReCon
Image Retrieval with Multi-Modal Query	COCO-Noisy	Text-to-image R@5	91	ReCon
Image Retrieval with Multi-Modal Query	CC152K	Image-to-text R@1	43.1	ReCon
Image Retrieval with Multi-Modal Query	CC152K	Image-to-text R@10	78.1	ReCon
Image Retrieval with Multi-Modal Query	CC152K	Image-to-text R@5	68.7	ReCon
Image Retrieval with Multi-Modal Query	CC152K	R-Sum	380.5	ReCon
Image Retrieval with Multi-Modal Query	CC152K	Text-to-image R@1	44.9	ReCon
Image Retrieval with Multi-Modal Query	CC152K	Text-to-image R@10	77.4	ReCon
Image Retrieval with Multi-Modal Query	CC152K	Text-to-image R@5	68.3	ReCon
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Image-to-text R@1	80.3	ReCon
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Image-to-text R@10	97.8	ReCon
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Image-to-text R@5	95.3	ReCon
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	R-Sum	511.8	ReCon
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Text-to-image R@1	61.6	ReCon
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Text-to-image R@10	91.3	ReCon
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Text-to-image R@5	85.5	ReCon
Cross-Modal Information Retrieval	COCO-Noisy	Image-to-text R@1	80.9	ReCon
Cross-Modal Information Retrieval	COCO-Noisy	Image-to-text R@10	98.8	ReCon
Cross-Modal Information Retrieval	COCO-Noisy	Image-to-text R@5	96.6	ReCon
Cross-Modal Information Retrieval	COCO-Noisy	R-Sum	528.6	ReCon
Cross-Modal Information Retrieval	COCO-Noisy	Text-to-image R@1	65.2	ReCon
Cross-Modal Information Retrieval	COCO-Noisy	Text-to-image R@10	96	ReCon
Cross-Modal Information Retrieval	COCO-Noisy	Text-to-image R@5	91	ReCon
Cross-Modal Information Retrieval	CC152K	Image-to-text R@1	43.1	ReCon
Cross-Modal Information Retrieval	CC152K	Image-to-text R@10	78.1	ReCon
Cross-Modal Information Retrieval	CC152K	Image-to-text R@5	68.7	ReCon
Cross-Modal Information Retrieval	CC152K	R-Sum	380.5	ReCon
Cross-Modal Information Retrieval	CC152K	Text-to-image R@1	44.9	ReCon
Cross-Modal Information Retrieval	CC152K	Text-to-image R@10	77.4	ReCon
Cross-Modal Information Retrieval	CC152K	Text-to-image R@5	68.3	ReCon
Cross-Modal Information Retrieval	Flickr30K-Noisy	Image-to-text R@1	80.3	ReCon
Cross-Modal Information Retrieval	Flickr30K-Noisy	Image-to-text R@10	97.8	ReCon
Cross-Modal Information Retrieval	Flickr30K-Noisy	Image-to-text R@5	95.3	ReCon
Cross-Modal Information Retrieval	Flickr30K-Noisy	R-Sum	511.8	ReCon
Cross-Modal Information Retrieval	Flickr30K-Noisy	Text-to-image R@1	61.6	ReCon
Cross-Modal Information Retrieval	Flickr30K-Noisy	Text-to-image R@10	91.3	ReCon
Cross-Modal Information Retrieval	Flickr30K-Noisy	Text-to-image R@5	85.5	ReCon
Cross-Modal Retrieval	COCO-Noisy	Image-to-text R@1	80.9	ReCon
Cross-Modal Retrieval	COCO-Noisy	Image-to-text R@10	98.8	ReCon
Cross-Modal Retrieval	COCO-Noisy	Image-to-text R@5	96.6	ReCon
Cross-Modal Retrieval	COCO-Noisy	R-Sum	528.6	ReCon
Cross-Modal Retrieval	COCO-Noisy	Text-to-image R@1	65.2	ReCon
Cross-Modal Retrieval	COCO-Noisy	Text-to-image R@10	96	ReCon
Cross-Modal Retrieval	COCO-Noisy	Text-to-image R@5	91	ReCon
Cross-Modal Retrieval	CC152K	Image-to-text R@1	43.1	ReCon
Cross-Modal Retrieval	CC152K	Image-to-text R@10	78.1	ReCon
Cross-Modal Retrieval	CC152K	Image-to-text R@5	68.7	ReCon
Cross-Modal Retrieval	CC152K	R-Sum	380.5	ReCon
Cross-Modal Retrieval	CC152K	Text-to-image R@1	44.9	ReCon
Cross-Modal Retrieval	CC152K	Text-to-image R@10	77.4	ReCon
Cross-Modal Retrieval	CC152K	Text-to-image R@5	68.3	ReCon
Cross-Modal Retrieval	Flickr30K-Noisy	Image-to-text R@1	80.3	ReCon
Cross-Modal Retrieval	Flickr30K-Noisy	Image-to-text R@10	97.8	ReCon
Cross-Modal Retrieval	Flickr30K-Noisy	Image-to-text R@5	95.3	ReCon
Cross-Modal Retrieval	Flickr30K-Noisy	R-Sum	511.8	ReCon
Cross-Modal Retrieval	Flickr30K-Noisy	Text-to-image R@1	61.6	ReCon
Cross-Modal Retrieval	Flickr30K-Noisy	Text-to-image R@10	91.3	ReCon
Cross-Modal Retrieval	Flickr30K-Noisy	Text-to-image R@5	85.5	ReCon

ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning

Abstract

Results

Related Papers

ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning

Abstract

Results

Related Papers