TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/Flickr30k

Flickr30k

ImagesTextsCustom (research-only, non-commercial)Introduced 2014-01-01

The Flickr30k dataset contains 31,000 images collected from Flickr, together with 5 reference sentences provided by human annotators.

Source: Guiding Long-Short Term Memory for Image Caption Generation

Image Source: Dual-Path Convolutional Image-Text Embedding with Instance Loss

Benchmarks

Cross-Modal Information Retrieval/Image-to-text R@1Cross-Modal Information Retrieval/Image-to-text R@5Cross-Modal Information Retrieval/Image-to-text R@10Cross-Modal Information Retrieval/Text-to-image R@1Cross-Modal Information Retrieval/Text-to-image R@5Cross-Modal Information Retrieval/Text-to-image R@10Cross-Modal Retrieval/Image-to-text R@1Cross-Modal Retrieval/Image-to-text R@5Cross-Modal Retrieval/Image-to-text R@10Cross-Modal Retrieval/Text-to-image R@1Cross-Modal Retrieval/Text-to-image R@5Cross-Modal Retrieval/Text-to-image R@10Image Captioning/CIDErImage Retrieval/Recall@10Image Retrieval/Recall@5Image Retrieval/Recall@1Image Retrieval/Recall@SumImage Retrieval/Image-to-text R@1Image Retrieval/Image-to-text R@10Image Retrieval/Image-to-text R@5Image Retrieval/QPSImage Retrieval with Multi-Modal Query/Image-to-text R@1Image Retrieval with Multi-Modal Query/Image-to-text R@5Image Retrieval with Multi-Modal Query/Image-to-text R@10Image Retrieval with Multi-Modal Query/Text-to-image R@1Image Retrieval with Multi-Modal Query/Text-to-image R@5Image Retrieval with Multi-Modal Query/Text-to-image R@10Image-to-Text Retrieval/Recall@1Image-to-Text Retrieval/Recall@5Image-to-Text Retrieval/Recall@10Image-to-Text Retrieval/Recall@SumPhrase Grounding/Pointing Game AccuracySemi Supervised Learning for Image Captioning/CIDEr

Related Benchmarks

Flickr30K 1K test/Image Retrieval/R@1Flickr30K 1K test/Image Retrieval/R@10Flickr30K 1K test/Image Retrieval/R@5Flickr30K-Noisy/Cross-Modal Information Retrieval/Image-to-text R@1Flickr30K-Noisy/Cross-Modal Information Retrieval/Image-to-text R@10Flickr30K-Noisy/Cross-Modal Information Retrieval/Image-to-text R@5Flickr30K-Noisy/Cross-Modal Information Retrieval/R-SumFlickr30K-Noisy/Cross-Modal Information Retrieval/Text-to-image R@1Flickr30K-Noisy/Cross-Modal Information Retrieval/Text-to-image R@10Flickr30K-Noisy/Cross-Modal Information Retrieval/Text-to-image R@5Flickr30K-Noisy/Cross-Modal Retrieval/Image-to-text R@1Flickr30K-Noisy/Cross-Modal Retrieval/Image-to-text R@10Flickr30K-Noisy/Cross-Modal Retrieval/Image-to-text R@5Flickr30K-Noisy/Cross-Modal Retrieval/R-SumFlickr30K-Noisy/Cross-Modal Retrieval/Text-to-image R@1Flickr30K-Noisy/Cross-Modal Retrieval/Text-to-image R@10Flickr30K-Noisy/Cross-Modal Retrieval/Text-to-image R@5Flickr30K-Noisy/Image Retrieval with Multi-Modal Query/Image-to-text R@1Flickr30K-Noisy/Image Retrieval with Multi-Modal Query/Image-to-text R@10Flickr30K-Noisy/Image Retrieval with Multi-Modal Query/Image-to-text R@5Flickr30K-Noisy/Image Retrieval with Multi-Modal Query/R-SumFlickr30K-Noisy/Image Retrieval with Multi-Modal Query/Text-to-image R@1Flickr30K-Noisy/Image Retrieval with Multi-Modal Query/Text-to-image R@10Flickr30K-Noisy/Image Retrieval with Multi-Modal Query/Text-to-image R@5Flickr30k Captions test/Image Captioning/BLEU-4Flickr30k Captions test/Image Captioning/CIDErFlickr30k Captions test/Image Captioning/METEORFlickr30k Captions test/Image Captioning/SPICEFlickr30k Entities Dev/Phrase Grounding/R@1Flickr30k Entities Dev/Phrase Grounding/R@10Flickr30k Entities Dev/Phrase Grounding/R@5Flickr30k Entities Test/Phrase Grounding/R@1Flickr30k Entities Test/Phrase Grounding/R@10Flickr30k Entities Test/Phrase Grounding/R@5Flickr30k-CN/Image Retrieval/R@1Flickr30k-CN/Image Retrieval/R@10Flickr30k-CN/Image Retrieval/R@5

Statistics

Papers
880
Benchmarks
33

Links

Homepage

Tasks

Cross-Modal Information RetrievalCross-Modal RetrievalImage CaptioningImage RetrievalImage Retrieval with Multi-Modal QueryImage-to-Text RetrievalNode ClassificationPhrase GroundingSemi Supervised Learning for Image CaptioningVideo DescriptionZero-Shot Cross-Modal RetrievalZero-shot Text-to-Image Retrievalmage-to-Text Retrieval