TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Mitigating Noisy Correspondence by Geometrical Structure C...

Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning

Zihua Zhao, Mengxi Chen, Tianjie Dai, Jiangchao Yao, Bo Han, Ya zhang, Yanfeng Wang

2024-05-27CVPR 2024 1Cross-modal retrieval with noisy correspondence
PaperPDFCode(official)

Abstract

Noisy correspondence that refers to mismatches in cross-modal data pairs, is prevalent on human-annotated or web-crawled datasets. Prior approaches to leverage such data mainly consider the application of uni-modal noisy label learning without amending the impact on both cross-modal and intra-modal geometrical structures in multimodal learning. Actually, we find that both structures are effective to discriminate noisy correspondence through structural differences when being well-established. Inspired by this observation, we introduce a Geometrical Structure Consistency (GSC) method to infer the true correspondence. Specifically, GSC ensures the preservation of geometrical structures within and between modalities, allowing for the accurate discrimination of noisy samples based on structural differences. Utilizing these inferred true correspondence labels, GSC refines the learning of geometrical structures by filtering out the noisy samples. Experiments across four cross-modal datasets confirm that GSC effectively identifies noisy samples and significantly outperforms the current leading methods.

Results

TaskDatasetMetricValueModel
Image Retrieval with Multi-Modal QueryCOCO-NoisyImage-to-text R@179.5GSC-SGR
Image Retrieval with Multi-Modal QueryCOCO-NoisyImage-to-text R@1098.9GSC-SGR
Image Retrieval with Multi-Modal QueryCOCO-NoisyImage-to-text R@596.4GSC-SGR
Image Retrieval with Multi-Modal QueryCOCO-NoisyR-Sum525.7GSC-SGR
Image Retrieval with Multi-Modal QueryCOCO-NoisyText-to-image R@164.4GSC-SGR
Image Retrieval with Multi-Modal QueryCOCO-NoisyText-to-image R@1095.9GSC-SGR
Image Retrieval with Multi-Modal QueryCOCO-NoisyText-to-image R@590.6GSC-SGR
Image Retrieval with Multi-Modal QueryCC152KImage-to-text R@142.1GSC-SGR
Image Retrieval with Multi-Modal QueryCC152KImage-to-text R@1077.7GSC-SGR
Image Retrieval with Multi-Modal QueryCC152KImage-to-text R@568.4GSC-SGR
Image Retrieval with Multi-Modal QueryCC152KR-Sum375.1GSC-SGR
Image Retrieval with Multi-Modal QueryCC152KText-to-image R@142.2GSC-SGR
Image Retrieval with Multi-Modal QueryCC152KText-to-image R@1077.1GSC-SGR
Image Retrieval with Multi-Modal QueryCC152KText-to-image R@567.6GSC-SGR
Image Retrieval with Multi-Modal QueryFlickr30K-NoisyImage-to-text R@178.3GSC-SGR
Image Retrieval with Multi-Modal QueryFlickr30K-NoisyImage-to-text R@1097.8GSC-SGR
Image Retrieval with Multi-Modal QueryFlickr30K-NoisyImage-to-text R@594.6GSC-SGR
Image Retrieval with Multi-Modal QueryFlickr30K-NoisyR-Sum505.8GSC-SGR
Image Retrieval with Multi-Modal QueryFlickr30K-NoisyText-to-image R@160.1GSC-SGR
Image Retrieval with Multi-Modal QueryFlickr30K-NoisyText-to-image R@1090.5GSC-SGR
Image Retrieval with Multi-Modal QueryFlickr30K-NoisyText-to-image R@584.5GSC-SGR
Cross-Modal Information RetrievalCOCO-NoisyImage-to-text R@179.5GSC-SGR
Cross-Modal Information RetrievalCOCO-NoisyImage-to-text R@1098.9GSC-SGR
Cross-Modal Information RetrievalCOCO-NoisyImage-to-text R@596.4GSC-SGR
Cross-Modal Information RetrievalCOCO-NoisyR-Sum525.7GSC-SGR
Cross-Modal Information RetrievalCOCO-NoisyText-to-image R@164.4GSC-SGR
Cross-Modal Information RetrievalCOCO-NoisyText-to-image R@1095.9GSC-SGR
Cross-Modal Information RetrievalCOCO-NoisyText-to-image R@590.6GSC-SGR
Cross-Modal Information RetrievalCC152KImage-to-text R@142.1GSC-SGR
Cross-Modal Information RetrievalCC152KImage-to-text R@1077.7GSC-SGR
Cross-Modal Information RetrievalCC152KImage-to-text R@568.4GSC-SGR
Cross-Modal Information RetrievalCC152KR-Sum375.1GSC-SGR
Cross-Modal Information RetrievalCC152KText-to-image R@142.2GSC-SGR
Cross-Modal Information RetrievalCC152KText-to-image R@1077.1GSC-SGR
Cross-Modal Information RetrievalCC152KText-to-image R@567.6GSC-SGR
Cross-Modal Information RetrievalFlickr30K-NoisyImage-to-text R@178.3GSC-SGR
Cross-Modal Information RetrievalFlickr30K-NoisyImage-to-text R@1097.8GSC-SGR
Cross-Modal Information RetrievalFlickr30K-NoisyImage-to-text R@594.6GSC-SGR
Cross-Modal Information RetrievalFlickr30K-NoisyR-Sum505.8GSC-SGR
Cross-Modal Information RetrievalFlickr30K-NoisyText-to-image R@160.1GSC-SGR
Cross-Modal Information RetrievalFlickr30K-NoisyText-to-image R@1090.5GSC-SGR
Cross-Modal Information RetrievalFlickr30K-NoisyText-to-image R@584.5GSC-SGR
Cross-Modal RetrievalCOCO-NoisyImage-to-text R@179.5GSC-SGR
Cross-Modal RetrievalCOCO-NoisyImage-to-text R@1098.9GSC-SGR
Cross-Modal RetrievalCOCO-NoisyImage-to-text R@596.4GSC-SGR
Cross-Modal RetrievalCOCO-NoisyR-Sum525.7GSC-SGR
Cross-Modal RetrievalCOCO-NoisyText-to-image R@164.4GSC-SGR
Cross-Modal RetrievalCOCO-NoisyText-to-image R@1095.9GSC-SGR
Cross-Modal RetrievalCOCO-NoisyText-to-image R@590.6GSC-SGR
Cross-Modal RetrievalCC152KImage-to-text R@142.1GSC-SGR
Cross-Modal RetrievalCC152KImage-to-text R@1077.7GSC-SGR
Cross-Modal RetrievalCC152KImage-to-text R@568.4GSC-SGR
Cross-Modal RetrievalCC152KR-Sum375.1GSC-SGR
Cross-Modal RetrievalCC152KText-to-image R@142.2GSC-SGR
Cross-Modal RetrievalCC152KText-to-image R@1077.1GSC-SGR
Cross-Modal RetrievalCC152KText-to-image R@567.6GSC-SGR
Cross-Modal RetrievalFlickr30K-NoisyImage-to-text R@178.3GSC-SGR
Cross-Modal RetrievalFlickr30K-NoisyImage-to-text R@1097.8GSC-SGR
Cross-Modal RetrievalFlickr30K-NoisyImage-to-text R@594.6GSC-SGR
Cross-Modal RetrievalFlickr30K-NoisyR-Sum505.8GSC-SGR
Cross-Modal RetrievalFlickr30K-NoisyText-to-image R@160.1GSC-SGR
Cross-Modal RetrievalFlickr30K-NoisyText-to-image R@1090.5GSC-SGR
Cross-Modal RetrievalFlickr30K-NoisyText-to-image R@584.5GSC-SGR

Related Papers

ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning2025-02-27PC$^2$: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval2024-08-02UGNCL: Uncertainty-Guided Noisy Correspondence Learning for Efficient Cross-Modal Matching2024-07-11Breaking Through the Noisy Correspondence: A Robust Model for Image-Text Matching2024-04-29Learning with Noisy Correspondence2024-04-13Cross-modal Retrieval with Noisy Correspondence via Consistency Refining and Mining2024-03-25NAC: Mitigating Noisy Correspondence in Cross-Modal Matching Via Neighbor Auxiliary Corrector2024-03-18REPAIR: Rank Correlation and Noisy Pair Half-replacing with Memory for Noisy Correspondence2024-03-13