TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Why Is It Hate Speech? Masked Rationale Prediction for Exp...

Why Is It Hate Speech? Masked Rationale Prediction for Explainable Hate Speech Detection

Jiyun Kim, Byounghan Lee, Kyung-Ah Sohn

2022-11-01COLING 2022 10Hate Speech Detection
PaperPDFCode(official)

Abstract

In a hate speech detection model, we should consider two critical aspects in addition to detection performance-bias and explainability. Hate speech cannot be identified based solely on the presence of specific words: the model should be able to reason like humans and be explainable. To improve the performance concerning the two aspects, we propose Masked Rationale Prediction (MRP) as an intermediate task. MRP is a task to predict the masked human rationales-snippets of a sentence that are grounds for human judgment-by referring to surrounding tokens combined with their unmasked rationales. As the model learns its reasoning ability based on rationales by MRP, it performs hate speech detection robustly in terms of bias and explainability. The proposed method generally achieves state-of-the-art performance in various metrics, demonstrating its effectiveness for hate speech detection.

Results

TaskDatasetMetricValueModel
Abuse DetectionHateXplainAUROC0.862BERT-MRP
Abuse DetectionHateXplainAccuracy0.704BERT-MRP
Abuse DetectionHateXplainMacro F10.699BERT-MRP
Abuse DetectionHateXplainAUROC0.853BERT-RP
Abuse DetectionHateXplainAccuracy0.707BERT-RP
Abuse DetectionHateXplainMacro F10.693BERT-RP
Hate Speech DetectionHateXplainAUROC0.862BERT-MRP
Hate Speech DetectionHateXplainAccuracy0.704BERT-MRP
Hate Speech DetectionHateXplainMacro F10.699BERT-MRP
Hate Speech DetectionHateXplainAUROC0.853BERT-RP
Hate Speech DetectionHateXplainAccuracy0.707BERT-RP
Hate Speech DetectionHateXplainMacro F10.693BERT-RP

Related Papers

Fine-Grained Chinese Hate Speech Understanding: Span-Level Resources, Coded Term Lexicon, and Enhanced Detection Frameworks2025-07-15Rethinking Hate Speech Detection on Social Media: Can LLMs Replace Traditional Models?2025-06-15Towards Fairness Assessment of Dutch Hate Speech Detection2025-06-14ToxSyn-PT: A Large-Scale Synthetic Dataset for Hate Speech Detection in Portuguese2025-06-11Hateful Person or Hateful Model? Investigating the Role of Personas in Hate Speech Detection by Large Language Models2025-06-10Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with Large Language Models2025-06-09Cracking the Code: Enhancing Implicit Hate Speech Detection through Coding Classification2025-06-05On Fairness of Task Arithmetic: The Role of Task Vectors2025-05-30