Papers With Code 2 | ML Benchmarks, SotA Results & Code

The increase in religiously motivated hate on social media is clear and ongoing. These platforms have become fertile ground for the dissemination of hate speech directed at religious communities, resulting in tangible repercussions in the real world. Much of the current research concerning the automated identification of hateful content on social media focuses on English-language content. There is comparatively less exploration in low-resource languages such as Hindi. As social media users increasingly utilize their regional languages for expression, it becomes crucial to dedicate appropriate research efforts to hate speech detection in these languages.

Hence, this work aims to fill this research void by introducing a meticulously curated and annotated dataset of YouTube comments in Hindi-English code-mixed language, specifically designed to identify instances of religious hate.

Citation: Sharma, D., Singh, A., & Singh, V. K. (2024). THAR-Targeted Hate Speech Against Religion: A high-quality Hindi-English code-mixed Dataset with the Application of Deep Learning Models for Automatic Detection. ACM Transactions on Asian and Low-Resource Language Information Processing. (https://doi.org/10.1145/3653017)

THAR Dataset