Papers With Code 2 | ML Benchmarks, SotA Results & Code

The TweetSentBR Dataset is a valuable resource for sentiment analysis in Brazilian Portuguese. Let me provide you with some details about it:

Description:
- The dataset consists of 15,000 manually annotated sentences extracted from tweets in Brazilian Portuguese.
- These sentences are specifically related to the TV show domain.
- Each sentence has been labeled into one of three classes: positive, neutral, or negative sentiment.
- The annotation process followed literature guidelines to ensure reliability.
Purpose:
- Researchers and practitioners in the field of Natural Language Processing (NLP) use this dataset for sentiment analysis tasks.
- It serves as a benchmark for developing and evaluating novel methods and approaches for sentiment classification.
Performance:
- Baseline experiments on polarity classification using three machine learning methods achieved the following results:
  - Binary classification (positive vs. negative): 80.99% F-Measure and 82.06% accuracy.
  - Three-point classification (positive, neutral, negative): 59.85% F-Measure and 64.62% accuracy.

Source: Conversation with Bing, 3/16/2024 (1) Building a Sentiment Corpus of Tweets in Brazilian Portuguese. https://arxiv.org/abs/1712.08917. (2) 7 Best Portuguese Language Speech Datasets of 2022 | Twine. https://www.twine.net/blog/portuguese-language-speech-datasets/. (3) A survey and study impact of tweet sentiment analysis via ... - Springer. https://link.springer.com/article/10.1007/s10579-023-09687-8. (4) Top 25 Twitter Datasets for NLP and Machine Learning | iMerit. https://imerit.net/blog/top-25-twitter-datasets-for-natural-language-processing-and-machine-learning-all-pbm/. (5) Building a Sentiment Corpus of Tweets in Brazilian Portuguese - arXiv.org. https://arxiv.org/pdf/1712.08917v1.pdf. (6) undefined. https://doi.org/10.48550/arXiv.1712.08917.

tweetSentBR