Papers With Code 2 | ML Benchmarks, SotA Results & Code

The HateBR dataset is a significant resource for studying offensive language and hate speech detection in Brazilian Portuguese. Here are the key details about this dataset:

Collection and Annotation:
- The HateBR dataset was collected from Brazilian Instagram comments related to politicians.
- It was manually annotated by specialists who carefully labeled each comment.
- The dataset consists of 7,000 documents.
Annotation Layers:
- The HateBR dataset includes annotations at three different levels:
  - Binary Classification: Comments are labeled as either offensive or non-offensive.
  - Offensiveness Levels: Comments are categorized as highly, moderately, or slightly offensive.
  - Hate Speech Targets: Comments are further classified into nine specific hate speech categories:
    - Xenophobia
    - Racism
    - Homophobia
    - Sexism
    - Religious intolerance
    - Partyism
    - Apology for the dictatorship
    - Antisemitism
    - Fatphobia
Inter-Annotator Agreement:
- Each comment was annotated by three different annotators to ensure reliability.
- The dataset achieved high inter-annotator agreement.
Baseline Performance:
- Baseline experiments using machine learning models achieved an F1-score of 85%, outperforming existing baselines for Portuguese language hate speech datasets.
Corpus and Models:
- The HateBR dataset includes a corpus of annotated comments.
- The repository contains the best models presented in the associated research paper.
File Format:
- The HateBr.csv file provides four columns:
  - 1st column: Instagram comments.
  - 2nd column: Offensive language classification (offensive vs. non-offensive).
  - 3rd column: Offensiveness level (highly, moderately, slightly offensive).
  - 4th column: Hate speech classification (nine different targets).

Source: Conversation with Bing, 3/16/2024 (1) HateBR - Offensive Language and Hate Speech Dataset in ... - GitHub. https://github.com/franciellevargas/HateBR. (2) ruanchaves/hatebr · Datasets at Hugging Face. https://huggingface.co/datasets/ruanchaves/hatebr. (3) Papers with Code - HateBR: Large expert annotated corpus of Brazilian .... https://paperswithcode.com/paper/hatebr-large-expert-annotated-corpus-of.