Papers With Code 2 | ML Benchmarks, SotA Results & Code

Capriccio is a sentiment classification dataset on tweets that simulates data drift. It is created by slicing the Sentiment140 dataset (homepage, Huggingface datasets) with a sliding window of 500,000 tweets, resulting in 38 slices. Thus, each slice can be used to represent the training/validation dataset of a sentiment classification model that is re-trained every day. Each slice has 425,000 tweets for training (file named %d_train.json) and 75,000 tweets for validation (file named %d_val.json).

The name comes from the adjective capricious.