RSICD
Remote Sensing Image Captioning Dataset
Images
The Remote Sensing Image Captioning Dataset (RSICD) is a dataset for remote sensing image captioning task. It contains more than ten thousands remote sensing images which are collected from Google Earth, Baidu Map, MapABC and Tianditu. The images are fixed to 224X224 pixels with various resolutions. The total number of remote sensing images is 10921, with five sentences descriptions per image.
Source: https://github.com/201528014227051/RSICD_optimal Image Source: https://github.com/201528014227051/RSICD_optimal
Benchmarks
Cross-Modal Information Retrieval/Mean RecallCross-Modal Information Retrieval/Image-to-text R@1Cross-Modal Information Retrieval/text-to-image R@1Cross-Modal Retrieval/Mean RecallCross-Modal Retrieval/Image-to-text R@1Cross-Modal Retrieval/text-to-image R@1Image Retrieval with Multi-Modal Query/Mean RecallImage Retrieval with Multi-Modal Query/Image-to-text R@1Image Retrieval with Multi-Modal Query/text-to-image R@1Image-to-Text Retrieval/Image to Text Recall@1Retrieval/Recall@1