RSICD

Remote Sensing Image Captioning Dataset

Images

The Remote Sensing Image Captioning Dataset (RSICD) is a dataset for remote sensing image captioning task. It contains more than ten thousands remote sensing images which are collected from Google Earth, Baidu Map, MapABC and Tianditu. The images are fixed to 224X224 pixels with various resolutions. The total number of remote sensing images is 10921, with five sentences descriptions per image.

Source: https://github.com/201528014227051/RSICD_optimal Image Source: https://github.com/201528014227051/RSICD_optimal

Benchmarks

Cross-Modal Information Retrieval/Mean Recall Cross-Modal Information Retrieval/Image-to-text R@1 Cross-Modal Information Retrieval/text-to-image R@1 Cross-Modal Retrieval/Mean Recall Cross-Modal Retrieval/Image-to-text R@1 Cross-Modal Retrieval/text-to-image R@1 Image Retrieval with Multi-Modal Query/Mean Recall Image Retrieval with Multi-Modal Query/Image-to-text R@1 Image Retrieval with Multi-Modal Query/text-to-image R@1 Image-to-Text Retrieval/Image to Text Recall@1 Retrieval/Recall@1