Infinity-MM

ImagesTextsVideosIntroduced 2024-10-24

We collect, organize and open-source the large-scale multimodal instruction dataset, Infinity-MM, consisting of tens of millions of samples. Through quality filtering and deduplication, the dataset has high quality and diversity. We propose a synthetic data generation method based on open-source models and labeling system, using detailed image annotations and diverse question generation.