Why Is It Hate Speech? Masked Rationale Prediction for Explainable Hate Speech Detection

Jiyun Kim, Byounghan Lee, Kyung-Ah Sohn

2022-11-01COLING 2022 10Hate Speech Detection

Abstract

In a hate speech detection model, we should consider two critical aspects in addition to detection performance-bias and explainability. Hate speech cannot be identified based solely on the presence of specific words: the model should be able to reason like humans and be explainable. To improve the performance concerning the two aspects, we propose Masked Rationale Prediction (MRP) as an intermediate task. MRP is a task to predict the masked human rationales-snippets of a sentence that are grounds for human judgment-by referring to surrounding tokens combined with their unmasked rationales. As the model learns its reasoning ability based on rationales by MRP, it performs hate speech detection robustly in terms of bias and explainability. The proposed method generally achieves state-of-the-art performance in various metrics, demonstrating its effectiveness for hate speech detection.

Results

Task	Dataset	Metric	Value	Model
Abuse Detection	HateXplain	AUROC	0.862	BERT-MRP
Abuse Detection	HateXplain	Accuracy	0.704	BERT-MRP
Abuse Detection	HateXplain	Macro F1	0.699	BERT-MRP
Abuse Detection	HateXplain	AUROC	0.853	BERT-RP
Abuse Detection	HateXplain	Accuracy	0.707	BERT-RP
Abuse Detection	HateXplain	Macro F1	0.693	BERT-RP
Hate Speech Detection	HateXplain	AUROC	0.862	BERT-MRP
Hate Speech Detection	HateXplain	Accuracy	0.704	BERT-MRP
Hate Speech Detection	HateXplain	Macro F1	0.699	BERT-MRP
Hate Speech Detection	HateXplain	AUROC	0.853	BERT-RP
Hate Speech Detection	HateXplain	Accuracy	0.707	BERT-RP
Hate Speech Detection	HateXplain	Macro F1	0.693	BERT-RP

Related Papers

Fine-Grained Chinese Hate Speech Understanding: Span-Level Resources, Coded Term Lexicon, and Enhanced Detection Frameworks2025-07-15 Rethinking Hate Speech Detection on Social Media: Can LLMs Replace Traditional Models?2025-06-15 Towards Fairness Assessment of Dutch Hate Speech Detection2025-06-14 ToxSyn-PT: A Large-Scale Synthetic Dataset for Hate Speech Detection in Portuguese2025-06-11 Hateful Person or Hateful Model? Investigating the Role of Personas in Hate Speech Detection by Large Language Models2025-06-10 Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with Large Language Models2025-06-09 Cracking the Code: Enhancing Implicit Hate Speech Detection through Coding Classification2025-06-05 On Fairness of Task Arithmetic: The Role of Task Vectors2025-05-30