DeToxy
DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances
SpeechIntroduced 2021-10-14
DeToxy is a publicly available toxicity annotated dataset for the English language. DeToxy is sourced from various openly available speech databases and consists of over 2 million utterances. The dataset would act as a benchmark for the relatively new and un-explored Spoken Language Processing task of detecting toxicity from spoken utterances and boost further research in this space.