SummScreen

Introduced 2021-04-14

SummScreen is a dataset for abstractive screenplay summarization. It consists of pairs of TV series transcripts and human-written recaps. This dataset provides a challenging testbed for abstractive summarization for several reasons:

  • Plot details are often expressed indirectly in character dialogues and may be scattered across the entirety of the transcript.
  • These details must be found and integrated to form the succinct plot descriptions in the recaps.
  • TV scripts contain content that does not directly pertain to the central plot but rather serves to develop characters or provide comic relief. This information is rarely contained in recaps.