tweetSentBR

Introduced 2017-12-24

The TweetSentBR Dataset is a valuable resource for sentiment analysis in Brazilian Portuguese. Let me provide you with some details about it:

  1. Description:

    • The dataset consists of 15,000 manually annotated sentences extracted from tweets in Brazilian Portuguese.
    • These sentences are specifically related to the TV show domain.
    • Each sentence has been labeled into one of three classes: positive, neutral, or negative sentiment.
    • The annotation process followed literature guidelines to ensure reliability.
  2. Purpose:

    • Researchers and practitioners in the field of Natural Language Processing (NLP) use this dataset for sentiment analysis tasks.
    • It serves as a benchmark for developing and evaluating novel methods and approaches for sentiment classification.
  3. Performance:

    • Baseline experiments on polarity classification using three machine learning methods achieved the following results:
      • Binary classification (positive vs. negative): 80.99% F-Measure and 82.06% accuracy.
      • Three-point classification (positive, neutral, negative): 59.85% F-Measure and 64.62% accuracy.

Source: Conversation with Bing, 3/16/2024 (1) Building a Sentiment Corpus of Tweets in Brazilian Portuguese. https://arxiv.org/abs/1712.08917. (2) 7 Best Portuguese Language Speech Datasets of 2022 | Twine. https://www.twine.net/blog/portuguese-language-speech-datasets/. (3) A survey and study impact of tweet sentiment analysis via ... - Springer. https://link.springer.com/article/10.1007/s10579-023-09687-8. (4) Top 25 Twitter Datasets for NLP and Machine Learning | iMerit. https://imerit.net/blog/top-25-twitter-datasets-for-natural-language-processing-and-machine-learning-all-pbm/. (5) Building a Sentiment Corpus of Tweets in Brazilian Portuguese - arXiv.org. https://arxiv.org/pdf/1712.08917v1.pdf. (6) undefined. https://doi.org/10.48550/arXiv.1712.08917.