Reddit Engagement Dataset

TextsMIT LicenseIntroduced 2022-10-22

Reddit Engagement Dataset (RED), a distant-supervision set, with 80k single-turn conversations. RED is sourced from Reddit, sampling from 43 popular subreddits, and processed from a total of 5 million posts, filtering out data that was either non-conversational, toxic, or posts not possible to ascertain popularity.

Source: EnDex: Evaluation of Dialogue Engagingness at Scale

Image Source: https://arxiv.org/pdf/2210.12362v1.pdf