VidChapters-7M
TextsVideosMITIntroduced 2023-09-25
VidChapters-7M is a dataset of 817K user-chaptered videos including 7M chapters in total. VidChapters-7M is automatically created from videos online in a scalable manner by scraping user-annotated chapters and hence without any additional manual annotation. It is designed for training and evaluating models for video chapter generation with or without ground-truth boundaries, and video chapter grounding, as well as for video-language pretraining.
Benchmarks
Dense Video Captioning/CIDErLanguage-Based Temporal Localization/R1@.9Language-Based Temporal Localization/R@10sTemporal Localization/R1@.9Temporal Localization/R@10sVideo Captioning/CIDErVideo Chaptering/P@5sVideo Chaptering/CIDErVideo Chaptering/P@0.5Video Chaptering/P@0.7Video Chaptering/P@3sVideo Chaptering/R@0.5Video Chaptering/R@0.7Video Chaptering/R@3sVideo Chaptering/R@5sVideo Chaptering/SODA