OpenS2V-5M
ImagesTextsVideosCC-BY-4.0Introduced 2025-05-26
We create the first open-source large-scale S2V generation dataset OpenS2V-5M, which consists of five million high-quality
720P subject-text-video triples. To ensure subject-information diversity in our dataset by, we (1) segmenting subjects
and building pairing information via cross-video associations and (2) prompting GPT-Image on raw frames to synthesize multi-view representations. The dataset supports both Subject-to-Video and Text-to-Video generation tasks.