MultiSum
MultiSum is a dataset for multimodal summarization (MSMO). It consists of 17 categories and 170 subcategories to encapsulate a diverse array of real-world scenarios. The dataset features:
1)Human-validated summaries for both video and textual content, providing superior human instruction and labels for multimodal learning.
-
Comprehensively and meticulously arranged categorization, spanning 17 principal categories and 170 subcategories to encapsulate a diverse array of real-world scenarios.
-
Benchmark tests performed on the proposed dataset to assess varied tasks and methods, including video temporal segmentation, video summarization, text summarization, and multimodal summarization.
Source: MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Image Source: MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos