K-MHaS: Korean Multi-label Hate Speech Dataset
Korean Multi-label Hate Speech Dataset
We introduce K-MHaS, a new multi-label dataset for hate speech detection that effectively handles Korean language patterns.
-
consisting of 109,692 utterances from Korean online news comments, labeled with 8 fine-grained hate speech classes.
-
data collection period: between January 2018 and June 2020.
-
providing (a) binary classification and (b) multi-label classification from 1(one) to 4(four) labels.
-
(a) binary classification: Hate Speech or Not Hate Speech
-
(b) fine-grained classification: Politics, Origin, Physical, Age, Gender, Religion, Race, and Profanity.
For the fine-grained classification, a Hate Speech class from the binary classification is broken down into eight classes, associated with the hate speech category.