Papers With Code 2 | ML Benchmarks, SotA Results & Code

Korean Multi-label Hate Speech Dataset

We introduce K-MHaS, a new multi-label dataset for hate speech detection that effectively handles Korean language patterns.

consisting of 109,692 utterances from Korean online news comments, labeled with 8 fine-grained hate speech classes.
data collection period: between January 2018 and June 2020.
providing (a) binary classification and (b) multi-label classification from 1(one) to 4(four) labels.
(a) binary classification: Hate Speech or Not Hate Speech
(b) fine-grained classification: Politics, Origin, Physical, Age, Gender, Religion, Race, and Profanity.

For the fine-grained classification, a Hate Speech class from the binary classification is broken down into eight classes, associated with the hate speech category.

K-MHaS: Korean Multi-label Hate Speech Dataset