Hostility Detection in Hindi leveraging Pre-Trained Language Models

Ojasv Kamal, Adarsh Kumar, Tejas Vaidhya

2021-01-14Hate Speech Detection Transfer Learning Fake News Detection

Abstract

Hostile content on social platforms is ever increasing. This has led to the need for proper detection of hostile posts so that appropriate action can be taken to tackle them. Though a lot of work has been done recently in the English Language to solve the problem of hostile content online, similar works in Indian Languages are quite hard to find. This paper presents a transfer learning based approach to classify social media (i.e Twitter, Facebook, etc.) posts in Hindi Devanagari script as Hostile or Non-Hostile. Hostile posts are further analyzed to determine if they are Hateful, Fake, Defamation, and Offensive. This paper harnesses attention based pre-trained models fine-tuned on Hindi data with Hostile-Non hostile task as Auxiliary and fusing its features for further sub-tasks classification. Through this approach, we establish a robust and consistent model without any ensembling or complex pre-processing. We have presented the results from our approach in CONSTRAINT-2021 Shared Task on hostile post detection where our model performs extremely well with 3rd runner up in terms of Weighted Fine-Grained F1 Score.

Results

Task	Dataset	Metric	Value	Model
Abuse Detection	Hostility Detection Dataset in Hindi	F1 score	0.5725	Auxiliary IndicBert
Fake News Detection	Hostility Detection Dataset in Hindi	F1 score	0.7741	Auxiliary IndicBert
Hate Speech Detection	Hostility Detection Dataset in Hindi	F1 score	0.5725	Auxiliary IndicBert

Related Papers

RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18 Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17 Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows2025-07-16 Fine-Grained Chinese Hate Speech Understanding: Span-Level Resources, Coded Term Lexicon, and Enhanced Detection Frameworks2025-07-15 Robust-Multi-Task Gradient Boosting2025-07-15 DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15 KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection2025-07-13 Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift2025-07-12