Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning

Zihua Zhao, Mengxi Chen, Tianjie Dai, Jiangchao Yao, Bo Han, Ya zhang, Yanfeng Wang

2024-05-27CVPR 2024 1Cross-modal retrieval with noisy correspondence

Abstract

Noisy correspondence that refers to mismatches in cross-modal data pairs, is prevalent on human-annotated or web-crawled datasets. Prior approaches to leverage such data mainly consider the application of uni-modal noisy label learning without amending the impact on both cross-modal and intra-modal geometrical structures in multimodal learning. Actually, we find that both structures are effective to discriminate noisy correspondence through structural differences when being well-established. Inspired by this observation, we introduce a Geometrical Structure Consistency (GSC) method to infer the true correspondence. Specifically, GSC ensures the preservation of geometrical structures within and between modalities, allowing for the accurate discrimination of noisy samples based on structural differences. Utilizing these inferred true correspondence labels, GSC refines the learning of geometrical structures by filtering out the noisy samples. Experiments across four cross-modal datasets confirm that GSC effectively identifies noisy samples and significantly outperforms the current leading methods.

Results

Task	Dataset	Metric	Value	Model
Image Retrieval with Multi-Modal Query	COCO-Noisy	Image-to-text R@1	79.5	GSC-SGR
Image Retrieval with Multi-Modal Query	COCO-Noisy	Image-to-text R@10	98.9	GSC-SGR
Image Retrieval with Multi-Modal Query	COCO-Noisy	Image-to-text R@5	96.4	GSC-SGR
Image Retrieval with Multi-Modal Query	COCO-Noisy	R-Sum	525.7	GSC-SGR
Image Retrieval with Multi-Modal Query	COCO-Noisy	Text-to-image R@1	64.4	GSC-SGR
Image Retrieval with Multi-Modal Query	COCO-Noisy	Text-to-image R@10	95.9	GSC-SGR
Image Retrieval with Multi-Modal Query	COCO-Noisy	Text-to-image R@5	90.6	GSC-SGR
Image Retrieval with Multi-Modal Query	CC152K	Image-to-text R@1	42.1	GSC-SGR
Image Retrieval with Multi-Modal Query	CC152K	Image-to-text R@10	77.7	GSC-SGR
Image Retrieval with Multi-Modal Query	CC152K	Image-to-text R@5	68.4	GSC-SGR
Image Retrieval with Multi-Modal Query	CC152K	R-Sum	375.1	GSC-SGR
Image Retrieval with Multi-Modal Query	CC152K	Text-to-image R@1	42.2	GSC-SGR
Image Retrieval with Multi-Modal Query	CC152K	Text-to-image R@10	77.1	GSC-SGR
Image Retrieval with Multi-Modal Query	CC152K	Text-to-image R@5	67.6	GSC-SGR
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Image-to-text R@1	78.3	GSC-SGR
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Image-to-text R@10	97.8	GSC-SGR
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Image-to-text R@5	94.6	GSC-SGR
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	R-Sum	505.8	GSC-SGR
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Text-to-image R@1	60.1	GSC-SGR
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Text-to-image R@10	90.5	GSC-SGR
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Text-to-image R@5	84.5	GSC-SGR
Cross-Modal Information Retrieval	COCO-Noisy	Image-to-text R@1	79.5	GSC-SGR
Cross-Modal Information Retrieval	COCO-Noisy	Image-to-text R@10	98.9	GSC-SGR
Cross-Modal Information Retrieval	COCO-Noisy	Image-to-text R@5	96.4	GSC-SGR
Cross-Modal Information Retrieval	COCO-Noisy	R-Sum	525.7	GSC-SGR
Cross-Modal Information Retrieval	COCO-Noisy	Text-to-image R@1	64.4	GSC-SGR
Cross-Modal Information Retrieval	COCO-Noisy	Text-to-image R@10	95.9	GSC-SGR
Cross-Modal Information Retrieval	COCO-Noisy	Text-to-image R@5	90.6	GSC-SGR
Cross-Modal Information Retrieval	CC152K	Image-to-text R@1	42.1	GSC-SGR
Cross-Modal Information Retrieval	CC152K	Image-to-text R@10	77.7	GSC-SGR
Cross-Modal Information Retrieval	CC152K	Image-to-text R@5	68.4	GSC-SGR
Cross-Modal Information Retrieval	CC152K	R-Sum	375.1	GSC-SGR
Cross-Modal Information Retrieval	CC152K	Text-to-image R@1	42.2	GSC-SGR
Cross-Modal Information Retrieval	CC152K	Text-to-image R@10	77.1	GSC-SGR
Cross-Modal Information Retrieval	CC152K	Text-to-image R@5	67.6	GSC-SGR
Cross-Modal Information Retrieval	Flickr30K-Noisy	Image-to-text R@1	78.3	GSC-SGR
Cross-Modal Information Retrieval	Flickr30K-Noisy	Image-to-text R@10	97.8	GSC-SGR
Cross-Modal Information Retrieval	Flickr30K-Noisy	Image-to-text R@5	94.6	GSC-SGR
Cross-Modal Information Retrieval	Flickr30K-Noisy	R-Sum	505.8	GSC-SGR
Cross-Modal Information Retrieval	Flickr30K-Noisy	Text-to-image R@1	60.1	GSC-SGR
Cross-Modal Information Retrieval	Flickr30K-Noisy	Text-to-image R@10	90.5	GSC-SGR
Cross-Modal Information Retrieval	Flickr30K-Noisy	Text-to-image R@5	84.5	GSC-SGR
Cross-Modal Retrieval	COCO-Noisy	Image-to-text R@1	79.5	GSC-SGR
Cross-Modal Retrieval	COCO-Noisy	Image-to-text R@10	98.9	GSC-SGR
Cross-Modal Retrieval	COCO-Noisy	Image-to-text R@5	96.4	GSC-SGR
Cross-Modal Retrieval	COCO-Noisy	R-Sum	525.7	GSC-SGR
Cross-Modal Retrieval	COCO-Noisy	Text-to-image R@1	64.4	GSC-SGR
Cross-Modal Retrieval	COCO-Noisy	Text-to-image R@10	95.9	GSC-SGR
Cross-Modal Retrieval	COCO-Noisy	Text-to-image R@5	90.6	GSC-SGR
Cross-Modal Retrieval	CC152K	Image-to-text R@1	42.1	GSC-SGR
Cross-Modal Retrieval	CC152K	Image-to-text R@10	77.7	GSC-SGR
Cross-Modal Retrieval	CC152K	Image-to-text R@5	68.4	GSC-SGR
Cross-Modal Retrieval	CC152K	R-Sum	375.1	GSC-SGR
Cross-Modal Retrieval	CC152K	Text-to-image R@1	42.2	GSC-SGR
Cross-Modal Retrieval	CC152K	Text-to-image R@10	77.1	GSC-SGR
Cross-Modal Retrieval	CC152K	Text-to-image R@5	67.6	GSC-SGR
Cross-Modal Retrieval	Flickr30K-Noisy	Image-to-text R@1	78.3	GSC-SGR
Cross-Modal Retrieval	Flickr30K-Noisy	Image-to-text R@10	97.8	GSC-SGR
Cross-Modal Retrieval	Flickr30K-Noisy	Image-to-text R@5	94.6	GSC-SGR
Cross-Modal Retrieval	Flickr30K-Noisy	R-Sum	505.8	GSC-SGR
Cross-Modal Retrieval	Flickr30K-Noisy	Text-to-image R@1	60.1	GSC-SGR
Cross-Modal Retrieval	Flickr30K-Noisy	Text-to-image R@10	90.5	GSC-SGR
Cross-Modal Retrieval	Flickr30K-Noisy	Text-to-image R@5	84.5	GSC-SGR

Abstract

Results

Task	Dataset	Metric	Value	Model
Image Retrieval with Multi-Modal Query	COCO-Noisy	Image-to-text R@1	79.5	GSC-SGR
Image Retrieval with Multi-Modal Query	COCO-Noisy	Image-to-text R@10	98.9	GSC-SGR
Image Retrieval with Multi-Modal Query	COCO-Noisy	Image-to-text R@5	96.4	GSC-SGR
Image Retrieval with Multi-Modal Query	COCO-Noisy	R-Sum	525.7	GSC-SGR
Image Retrieval with Multi-Modal Query	COCO-Noisy	Text-to-image R@1	64.4	GSC-SGR
Image Retrieval with Multi-Modal Query	COCO-Noisy	Text-to-image R@10	95.9	GSC-SGR
Image Retrieval with Multi-Modal Query	COCO-Noisy	Text-to-image R@5	90.6	GSC-SGR
Image Retrieval with Multi-Modal Query	CC152K	Image-to-text R@1	42.1	GSC-SGR
Image Retrieval with Multi-Modal Query	CC152K	Image-to-text R@10	77.7	GSC-SGR
Image Retrieval with Multi-Modal Query	CC152K	Image-to-text R@5	68.4	GSC-SGR
Image Retrieval with Multi-Modal Query	CC152K	R-Sum	375.1	GSC-SGR
Image Retrieval with Multi-Modal Query	CC152K	Text-to-image R@1	42.2	GSC-SGR
Image Retrieval with Multi-Modal Query	CC152K	Text-to-image R@10	77.1	GSC-SGR
Image Retrieval with Multi-Modal Query	CC152K	Text-to-image R@5	67.6	GSC-SGR
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Image-to-text R@1	78.3	GSC-SGR
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Image-to-text R@10	97.8	GSC-SGR
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Image-to-text R@5	94.6	GSC-SGR
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	R-Sum	505.8	GSC-SGR
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Text-to-image R@1	60.1	GSC-SGR
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Text-to-image R@10	90.5	GSC-SGR
Image Retrieval with Multi-Modal Query	Flickr30K-Noisy	Text-to-image R@5	84.5	GSC-SGR
Cross-Modal Information Retrieval	COCO-Noisy	Image-to-text R@1	79.5	GSC-SGR
Cross-Modal Information Retrieval	COCO-Noisy	Image-to-text R@10	98.9	GSC-SGR
Cross-Modal Information Retrieval	COCO-Noisy	Image-to-text R@5	96.4	GSC-SGR
Cross-Modal Information Retrieval	COCO-Noisy	R-Sum	525.7	GSC-SGR
Cross-Modal Information Retrieval	COCO-Noisy	Text-to-image R@1	64.4	GSC-SGR
Cross-Modal Information Retrieval	COCO-Noisy	Text-to-image R@10	95.9	GSC-SGR
Cross-Modal Information Retrieval	COCO-Noisy	Text-to-image R@5	90.6	GSC-SGR
Cross-Modal Information Retrieval	CC152K	Image-to-text R@1	42.1	GSC-SGR
Cross-Modal Information Retrieval	CC152K	Image-to-text R@10	77.7	GSC-SGR
Cross-Modal Information Retrieval	CC152K	Image-to-text R@5	68.4	GSC-SGR
Cross-Modal Information Retrieval	CC152K	R-Sum	375.1	GSC-SGR
Cross-Modal Information Retrieval	CC152K	Text-to-image R@1	42.2	GSC-SGR
Cross-Modal Information Retrieval	CC152K	Text-to-image R@10	77.1	GSC-SGR
Cross-Modal Information Retrieval	CC152K	Text-to-image R@5	67.6	GSC-SGR
Cross-Modal Information Retrieval	Flickr30K-Noisy	Image-to-text R@1	78.3	GSC-SGR
Cross-Modal Information Retrieval	Flickr30K-Noisy	Image-to-text R@10	97.8	GSC-SGR
Cross-Modal Information Retrieval	Flickr30K-Noisy	Image-to-text R@5	94.6	GSC-SGR
Cross-Modal Information Retrieval	Flickr30K-Noisy	R-Sum	505.8	GSC-SGR
Cross-Modal Information Retrieval	Flickr30K-Noisy	Text-to-image R@1	60.1	GSC-SGR
Cross-Modal Information Retrieval	Flickr30K-Noisy	Text-to-image R@10	90.5	GSC-SGR
Cross-Modal Information Retrieval	Flickr30K-Noisy	Text-to-image R@5	84.5	GSC-SGR
Cross-Modal Retrieval	COCO-Noisy	Image-to-text R@1	79.5	GSC-SGR
Cross-Modal Retrieval	COCO-Noisy	Image-to-text R@10	98.9	GSC-SGR
Cross-Modal Retrieval	COCO-Noisy	Image-to-text R@5	96.4	GSC-SGR
Cross-Modal Retrieval	COCO-Noisy	R-Sum	525.7	GSC-SGR
Cross-Modal Retrieval	COCO-Noisy	Text-to-image R@1	64.4	GSC-SGR
Cross-Modal Retrieval	COCO-Noisy	Text-to-image R@10	95.9	GSC-SGR
Cross-Modal Retrieval	COCO-Noisy	Text-to-image R@5	90.6	GSC-SGR
Cross-Modal Retrieval	CC152K	Image-to-text R@1	42.1	GSC-SGR
Cross-Modal Retrieval	CC152K	Image-to-text R@10	77.7	GSC-SGR
Cross-Modal Retrieval	CC152K	Image-to-text R@5	68.4	GSC-SGR
Cross-Modal Retrieval	CC152K	R-Sum	375.1	GSC-SGR
Cross-Modal Retrieval	CC152K	Text-to-image R@1	42.2	GSC-SGR
Cross-Modal Retrieval	CC152K	Text-to-image R@10	77.1	GSC-SGR
Cross-Modal Retrieval	CC152K	Text-to-image R@5	67.6	GSC-SGR
Cross-Modal Retrieval	Flickr30K-Noisy	Image-to-text R@1	78.3	GSC-SGR
Cross-Modal Retrieval	Flickr30K-Noisy	Image-to-text R@10	97.8	GSC-SGR
Cross-Modal Retrieval	Flickr30K-Noisy	Image-to-text R@5	94.6	GSC-SGR
Cross-Modal Retrieval	Flickr30K-Noisy	R-Sum	505.8	GSC-SGR
Cross-Modal Retrieval	Flickr30K-Noisy	Text-to-image R@1	60.1	GSC-SGR
Cross-Modal Retrieval	Flickr30K-Noisy	Text-to-image R@10	90.5	GSC-SGR
Cross-Modal Retrieval	Flickr30K-Noisy	Text-to-image R@5	84.5	GSC-SGR

Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning

Abstract

Results

Related Papers

Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning

Abstract

Results

Related Papers