TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/REPAIR: Rank Correlation and Noisy Pair Half-replacing wit...

REPAIR: Rank Correlation and Noisy Pair Half-replacing with Memory for Noisy Correspondence

Ruochen Zheng, Jiahao Hong, Changxin Gao, Nong Sang

2024-03-13Cross-modal retrieval with noisy correspondence
PaperPDF

Abstract

The presence of noise in acquired data invariably leads to performance degradation in cross-modal matching. Unfortunately, obtaining precise annotations in the multimodal field is expensive, which has prompted some methods to tackle the mismatched data pair issue in cross-modal matching contexts, termed as noisy correspondence. However, most of these existing noisy correspondence methods exhibit the following limitations: a) the problem of self-reinforcing error accumulation, and b) improper handling of noisy data pair. To tackle the two problems, we propose a generalized framework termed as Rank corrElation and noisy Pair hAlf-replacing wIth memoRy (REPAIR), which benefits from maintaining a memory bank for features of matched pairs. Specifically, we calculate the distances between the features in the memory bank and those of the target pair for each respective modality, and use the rank correlation of these two sets of distances to estimate the soft correspondence label of the target pair. Estimating soft correspondence based on memory bank features rather than using a similarity network can avoid the accumulation of errors due to incorrect network identifications. For pairs that are completely mismatched, REPAIR searches the memory bank for the most matching feature to replace one feature of one modality, instead of using the original pair directly or merely discarding the mismatched pair. We conduct experiments on three cross-modal datasets, i.e., Flickr30K, MSCOCO, and CC152K, proving the effectiveness and robustness of our REPAIR on synthetic and real-world noise.

Results

TaskDatasetMetricValueModel
Image Retrieval with Multi-Modal QueryCOCO-NoisyImage-to-text R@178.3REPAIR
Image Retrieval with Multi-Modal QueryCOCO-NoisyImage-to-text R@1098.3REPAIR
Image Retrieval with Multi-Modal QueryCOCO-NoisyImage-to-text R@596.8REPAIR
Image Retrieval with Multi-Modal QueryCOCO-NoisyR-Sum521.2REPAIR
Image Retrieval with Multi-Modal QueryCOCO-NoisyText-to-image R@162.5REPAIR
Image Retrieval with Multi-Modal QueryCOCO-NoisyText-to-image R@1095.5REPAIR
Image Retrieval with Multi-Modal QueryCOCO-NoisyText-to-image R@589.8REPAIR
Image Retrieval with Multi-Modal QueryCC152KImage-to-text R@140.5REPAIR
Image Retrieval with Multi-Modal QueryCC152KImage-to-text R@1076.1REPAIR
Image Retrieval with Multi-Modal QueryCC152KImage-to-text R@567.7REPAIR
Image Retrieval with Multi-Modal QueryCC152KR-Sum369.2REPAIR
Image Retrieval with Multi-Modal QueryCC152KText-to-image R@140.3REPAIR
Image Retrieval with Multi-Modal QueryCC152KText-to-image R@1076.4REPAIR
Image Retrieval with Multi-Modal QueryCC152KText-to-image R@568.2REPAIR
Image Retrieval with Multi-Modal QueryFlickr30K-NoisyImage-to-text R@179.2REPAIR
Image Retrieval with Multi-Modal QueryFlickr30K-NoisyImage-to-text R@1096.9REPAIR
Image Retrieval with Multi-Modal QueryFlickr30K-NoisyImage-to-text R@595REPAIR
Image Retrieval with Multi-Modal QueryFlickr30K-NoisyR-Sum504.4REPAIR
Image Retrieval with Multi-Modal QueryFlickr30K-NoisyText-to-image R@159.4REPAIR
Image Retrieval with Multi-Modal QueryFlickr30K-NoisyText-to-image R@1089.5REPAIR
Image Retrieval with Multi-Modal QueryFlickr30K-NoisyText-to-image R@584.4REPAIR
Cross-Modal Information RetrievalCOCO-NoisyImage-to-text R@178.3REPAIR
Cross-Modal Information RetrievalCOCO-NoisyImage-to-text R@1098.3REPAIR
Cross-Modal Information RetrievalCOCO-NoisyImage-to-text R@596.8REPAIR
Cross-Modal Information RetrievalCOCO-NoisyR-Sum521.2REPAIR
Cross-Modal Information RetrievalCOCO-NoisyText-to-image R@162.5REPAIR
Cross-Modal Information RetrievalCOCO-NoisyText-to-image R@1095.5REPAIR
Cross-Modal Information RetrievalCOCO-NoisyText-to-image R@589.8REPAIR
Cross-Modal Information RetrievalCC152KImage-to-text R@140.5REPAIR
Cross-Modal Information RetrievalCC152KImage-to-text R@1076.1REPAIR
Cross-Modal Information RetrievalCC152KImage-to-text R@567.7REPAIR
Cross-Modal Information RetrievalCC152KR-Sum369.2REPAIR
Cross-Modal Information RetrievalCC152KText-to-image R@140.3REPAIR
Cross-Modal Information RetrievalCC152KText-to-image R@1076.4REPAIR
Cross-Modal Information RetrievalCC152KText-to-image R@568.2REPAIR
Cross-Modal Information RetrievalFlickr30K-NoisyImage-to-text R@179.2REPAIR
Cross-Modal Information RetrievalFlickr30K-NoisyImage-to-text R@1096.9REPAIR
Cross-Modal Information RetrievalFlickr30K-NoisyImage-to-text R@595REPAIR
Cross-Modal Information RetrievalFlickr30K-NoisyR-Sum504.4REPAIR
Cross-Modal Information RetrievalFlickr30K-NoisyText-to-image R@159.4REPAIR
Cross-Modal Information RetrievalFlickr30K-NoisyText-to-image R@1089.5REPAIR
Cross-Modal Information RetrievalFlickr30K-NoisyText-to-image R@584.4REPAIR
Cross-Modal RetrievalCOCO-NoisyImage-to-text R@178.3REPAIR
Cross-Modal RetrievalCOCO-NoisyImage-to-text R@1098.3REPAIR
Cross-Modal RetrievalCOCO-NoisyImage-to-text R@596.8REPAIR
Cross-Modal RetrievalCOCO-NoisyR-Sum521.2REPAIR
Cross-Modal RetrievalCOCO-NoisyText-to-image R@162.5REPAIR
Cross-Modal RetrievalCOCO-NoisyText-to-image R@1095.5REPAIR
Cross-Modal RetrievalCOCO-NoisyText-to-image R@589.8REPAIR
Cross-Modal RetrievalCC152KImage-to-text R@140.5REPAIR
Cross-Modal RetrievalCC152KImage-to-text R@1076.1REPAIR
Cross-Modal RetrievalCC152KImage-to-text R@567.7REPAIR
Cross-Modal RetrievalCC152KR-Sum369.2REPAIR
Cross-Modal RetrievalCC152KText-to-image R@140.3REPAIR
Cross-Modal RetrievalCC152KText-to-image R@1076.4REPAIR
Cross-Modal RetrievalCC152KText-to-image R@568.2REPAIR
Cross-Modal RetrievalFlickr30K-NoisyImage-to-text R@179.2REPAIR
Cross-Modal RetrievalFlickr30K-NoisyImage-to-text R@1096.9REPAIR
Cross-Modal RetrievalFlickr30K-NoisyImage-to-text R@595REPAIR
Cross-Modal RetrievalFlickr30K-NoisyR-Sum504.4REPAIR
Cross-Modal RetrievalFlickr30K-NoisyText-to-image R@159.4REPAIR
Cross-Modal RetrievalFlickr30K-NoisyText-to-image R@1089.5REPAIR
Cross-Modal RetrievalFlickr30K-NoisyText-to-image R@584.4REPAIR

Related Papers

ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning2025-02-27PC$^2$: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval2024-08-02UGNCL: Uncertainty-Guided Noisy Correspondence Learning for Efficient Cross-Modal Matching2024-07-11Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning2024-05-27Breaking Through the Noisy Correspondence: A Robust Model for Image-Text Matching2024-04-29Learning with Noisy Correspondence2024-04-13Cross-modal Retrieval with Noisy Correspondence via Consistency Refining and Mining2024-03-25NAC: Mitigating Noisy Correspondence in Cross-Modal Matching Via Neighbor Auxiliary Corrector2024-03-18