K-MHaS: Korean Multi-label Hate Speech Dataset

Textscc-by-sa-4.0Introduced 2022-08-23

Korean Multi-label Hate Speech Dataset

We introduce K-MHaS, a new multi-label dataset for hate speech detection that effectively handles Korean language patterns.

  • consisting of 109,692 utterances from Korean online news comments, labeled with 8 fine-grained hate speech classes.

  • data collection period: between January 2018 and June 2020.

  • providing (a) binary classification and (b) multi-label classification from 1(one) to 4(four) labels.

  • (a) binary classification: Hate Speech or Not Hate Speech

  • (b) fine-grained classification: Politics, Origin, Physical, Age, Gender, Religion, Race, and Profanity.

For the fine-grained classification, a Hate Speech class from the binary classification is broken down into eight classes, associated with the hate speech category.