Conceptual Captions 152K
CC152K is a subset of Conceptual Captions. It contains 150,000 randomly selected samples from the training split for training, 1,000 samples from the validation split for validation, and 1,000 samples from the validation split for testing.