TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MovieSum: An Abstractive Summarization Dataset for Movie S...

MovieSum: An Abstractive Summarization Dataset for Movie Screenplays

Rohit Saxena, Frank Keller

2024-08-12Long-Form Narrative SummarizationAbstractive Text SummarizationDocument Summarization
PaperPDFCode(official)

Abstract

Movie screenplay summarization is challenging, as it requires an understanding of long input contexts and various elements unique to movies. Large language models have shown significant advancements in document summarization, but they often struggle with processing long input contexts. Furthermore, while television transcripts have received attention in recent studies, movie screenplay summarization remains underexplored. To stimulate research in this area, we present a new dataset, MovieSum, for abstractive summarization of movie screenplays. This dataset comprises 2200 movie screenplays accompanied by their Wikipedia plot summaries. We manually formatted the movie screenplays to represent their structural elements. Compared to existing datasets, MovieSum possesses several distinctive features: (1) It includes movie screenplays, which are longer than scripts of TV episodes. (2) It is twice the size of previous movie screenplay datasets. (3) It provides metadata with IMDb IDs to facilitate access to additional external knowledge. We also show the results of recently released large language models applied to summarization on our dataset to provide a detailed baseline.

Results

TaskDatasetMetricValueModel
Text SummarizationMovieSumBERTScore (F1)58.92Description Only (LED)
Text SummarizationMovieSumBERTScore (F1)58.54Two-Stage Heuristic (LED Large)

Related Papers

GenerationPrograms: Fine-grained Attribution with Executable Programs2025-06-17Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences2025-06-16Improving Fairness of Large Language Models in Multi-document Summarization2025-06-09Advancing Decoding Strategies: Enhancements in Locally Typical Sampling for LLMs2025-06-03NexusSum: Hierarchical LLM Agents for Long-Form Narrative Summarization2025-05-30ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs2025-05-29Power-Law Decay Loss for Large Language Model Finetuning: Focusing on Information Sparsity to Enhance Generation Quality2025-05-22Ask, Retrieve, Summarize: A Modular Pipeline for Scientific Literature Summarization2025-05-22