CC12M

Conceptual 12M

ImagesTextsIntroduced 2021-02-17

Conceptual 12M (CC12M) is a dataset with 12 million image-text pairs specifically meant to be used for vision-and-language pre-training.

Source: Changpinyo et al.

Image source: Changpinyo et al.