Simple Shapes Dataset
ImagesTextsCC-BYIntroduced 2023-06-27
It consists of 32x32 pixel images of shapes with multiple attributes (size, location, rotation, color). Each image is also paired with its ground truth information (attributes), and a natural language description (English) of the image.
The dataset is composed of: a train set of 500,000 samples, a val and a test set of 1000 samples each.
It also contains already processed 12-dimensional visual features (from a VAE), and presaved BERT features of the text descriptions.
Link to dataset: https://zenodo.org/record/8112838