SODA

DialogCC-BY-4.0Introduced 2022-12-20

SODA is a high-quality social dialogue dataset. In contrast to most existing crowdsourced, small-scale dialogue corpora, Soda distills 1.5M socially-grounded dialogues from a pre-trained language model (InstructGPT; Ouyang et al., ). Dialogues are distilled by contextualizing social commonsense knowledge from a knowledge graph (Atomic10x).

Source: SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

Image Source: https://arxiv.org/pdf/2212.10465v1.pdf