HateBR
Introduced 2021-03-27
The HateBR dataset is a significant resource for studying offensive language and hate speech detection in Brazilian Portuguese. Here are the key details about this dataset:
-
Collection and Annotation:
- The HateBR dataset was collected from Brazilian Instagram comments related to politicians.
- It was manually annotated by specialists who carefully labeled each comment.
- The dataset consists of 7,000 documents.
-
Annotation Layers:
- The HateBR dataset includes annotations at three different levels:
- Binary Classification: Comments are labeled as either offensive or non-offensive.
- Offensiveness Levels: Comments are categorized as highly, moderately, or slightly offensive.
- Hate Speech Targets: Comments are further classified into nine specific hate speech categories:
- Xenophobia
- Racism
- Homophobia
- Sexism
- Religious intolerance
- Partyism
- Apology for the dictatorship
- Antisemitism
- Fatphobia
- The HateBR dataset includes annotations at three different levels:
-
Inter-Annotator Agreement:
- Each comment was annotated by three different annotators to ensure reliability.
- The dataset achieved high inter-annotator agreement.
-
Baseline Performance:
- Baseline experiments using machine learning models achieved an F1-score of 85%, outperforming existing baselines for Portuguese language hate speech datasets.
-
Corpus and Models:
- The HateBR dataset includes a corpus of annotated comments.
- The repository contains the best models presented in the associated research paper.
-
File Format:
- The
HateBr.csvfile provides four columns:- 1st column: Instagram comments.
- 2nd column: Offensive language classification (offensive vs. non-offensive).
- 3rd column: Offensiveness level (highly, moderately, slightly offensive).
- 4th column: Hate speech classification (nine different targets).
- The
Source: Conversation with Bing, 3/16/2024 (1) HateBR - Offensive Language and Hate Speech Dataset in ... - GitHub. https://github.com/franciellevargas/HateBR. (2) ruanchaves/hatebr · Datasets at Hugging Face. https://huggingface.co/datasets/ruanchaves/hatebr. (3) Papers with Code - HateBR: Large expert annotated corpus of Brazilian .... https://paperswithcode.com/paper/hatebr-large-expert-annotated-corpus-of.