Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/NAPReg

NAPReg

Reported on 27 benchmarks across 3 tasks

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Miscellaneous18 results

Image Retrieval with Multi-Modal QueryonMSCOCO-1k
Image-to-text R@1
81.9
Image Retrieval with Multi-Modal QueryonMSCOCO-1k
Text-to-image R@1
66.9
Image Retrieval with Multi-Modal QueryonFlickr30k
Image-to-text R@1
79.6
best: 98.8 (X2-VLM (large))
Image Retrieval with Multi-Modal QueryonFlickr30k
Text-to-image R@1
60
best: 93.3 (ERNIE-ViL 2.0)
Image Retrieval with Multi-Modal QueryonMS-COCO-2014
Text-to-image R@1
43
Image Retrieval with Multi-Modal QueryonCOCO 2014
Image-to-text R@1
59.8
best: 84.8 (BEiT-3)
Image Retrieval with Multi-Modal QueryonCOCO 2014
Text-to-image R@1
43
best: 68 (VAST)
Image Retrieval with Multi-Modal QueryonFlickr-8k
Image-to-text R@1
56.2
Image Retrieval with Multi-Modal QueryonFlickr-8k
Text-to-image R@1
39.2
Cross-Modal Information RetrievalonMSCOCO-1k
Image-to-text R@1
81.9
Cross-Modal Information RetrievalonMSCOCO-1k
Text-to-image R@1
66.9
Cross-Modal Information RetrievalonFlickr30k
Image-to-text R@1
79.6
best: 98.8 (X2-VLM (large))
Cross-Modal Information RetrievalonFlickr30k
Text-to-image R@1
60
best: 93.3 (ERNIE-ViL 2.0)
Cross-Modal Information RetrievalonMS-COCO-2014
Text-to-image R@1
43
Cross-Modal Information RetrievalonCOCO 2014
Image-to-text R@1
59.8
best: 84.8 (BEiT-3)
Cross-Modal Information RetrievalonCOCO 2014
Text-to-image R@1
43
best: 68 (VAST)
Cross-Modal Information RetrievalonFlickr-8k
Image-to-text R@1
56.2
Cross-Modal Information RetrievalonFlickr-8k
Text-to-image R@1
39.2

Natural Language Processing9 results

Cross-Modal RetrievalonMSCOCO-1k
Image-to-text R@1
81.9
Cross-Modal RetrievalonMSCOCO-1k
Text-to-image R@1
66.9
Cross-Modal RetrievalonFlickr30k
Image-to-text R@1
79.6
best: 98.8 (X2-VLM (large))
Cross-Modal RetrievalonFlickr30k
Text-to-image R@1
60
best: 93.3 (ERNIE-ViL 2.0)
Cross-Modal RetrievalonMS-COCO-2014
Text-to-image R@1
43
Cross-Modal RetrievalonCOCO 2014
Image-to-text R@1
59.8
best: 84.8 (BEiT-3)
Cross-Modal RetrievalonCOCO 2014
Text-to-image R@1
43
best: 68 (VAST)
Cross-Modal RetrievalonFlickr-8k
Image-to-text R@1
56.2
Cross-Modal RetrievalonFlickr-8k
Text-to-image R@1
39.2