SB10k
The SB10k dataset is a valuable resource for sentiment analysis in German. Here are the key details:
- Corpus Size: It contains approximately 10,000 German tweets¹.
- Language: German.
- Task: Text classification, specifically sentiment analysis.
- Multilinguality: Monolingual (German only).
- Size Category: Falls within the range of 1K to 10K examples.
- Tags: Sentiment analysis.
- License: CC-BY-4.0.
The dataset was created by annotating German tweets, with each tweet labeled by three annotators. Researchers have used SB10k to benchmark various machine learning classifiers, including convolutional neural networks (CNNs) and feature-based support vector machines (SVMs) for sentiment analysis²³.
(1) Alienmaster/SB10k · Datasets at Hugging Face. https://huggingface.co/datasets/Alienmaster/SB10k. (2) A Twitter Corpus and Benchmark Resources for German Sentiment Analysis. https://aclanthology.org/W17-1106/. (3) A Twitter Corpus and Benchmark Resources for German Sentiment Analysis. https://aclanthology.org/W17-1106.pdf. (4) undefined. http://t.co/9rhta65MSx. (5) undefined. http://t.co/G84qcIGk7k. (6) undefined. http://t.co/LvwyZgew4Q.