PT Hate Speech

Introduced 2019-08-01

The PT Hate Speech is a valuable resource for studying hate speech in the Portuguese language. Here are the key details about this dataset:

  1. Composition:

    • The dataset consists of 5,668 tweets written in Portuguese.
    • Annotators labeled these tweets using two different schemes based on their expertise levels.
  2. Annotation Schemes:

    • Non-experts initially annotated the tweets using binary labels: either 'hate' or 'no-hate'.
    • Expert annotators then classified the tweets using a fine-grained hierarchical multiple label scheme. This scheme includes 81 hate speech categories in total.
  3. Hierarchical Annotation Scheme:

    • The hierarchical approach allows for identifying different types of hate speech and their intersections.
    • The inter-annotator agreement varied across categories, reflecting the nuanced nature of hate speech perception.
  4. Usefulness and Baseline Experiment:

    • To demonstrate the dataset's usefulness, a baseline classification experiment was conducted using pre-trained word embeddings and LSTM models.
    • The results achieved a state-of-the-art outcome.

Source: Conversation with Bing, 3/16/2024 (1) A Hierarchically-Labeled Portuguese Hate Speech Dataset. https://aclanthology.org/W19-3510/. (2) A Hierarchically-Labeled Portuguese Hate Speech Dataset - ACL Anthology. https://aclanthology.org/W19-3510.pdf. (3) A Hierarchically-Labeled Portuguese Hate Speech Dataset. https://paperswithcode.com/paper/a-hierarchically-labeled-portuguese-hate. (4) undefined. https://aclanthology.org/W19-3510.