StockEmotions
This repository contains a financial-domain-focused dataset for financial sentiment/emotion classification and stock market time series prediction. It's based on our paper: StockEmotions: Discover Investor Emotions for Financial Sentiment Analysis and Multivariate Time Series accepted by AAAI 2023 Bridge (AI for Financial Services).
-
Data collection period: Jan 2020 - Dec 2020
-
Number of Utterance: 10,000 (train 80%, val 10%, test 10%)
-
Sentiment classes: 2 [bullish (~positive), bearish (~negative)]
-
Emotion classes: 12 [ambiguous, amusement, anger, anxiety, belief, confusion, depression, disgust, excitement, optimism, panic, surprise]
-
tweet/processed.csv: 50,281 samples with text-processed data for Topic Modelling
-
tweet/train, val, test.csv: 10,000 samples in total. Each file has id, date, ticker, emo_label, senti_lable, original, and processed content. For the data curation, processing (e.g. emoji, CTAG, HTAG), and annotation, we refer to our paper. The dataset is used for Financial Sentiment/Emotion Classification tasks.
-
price/38 companies: historical price data in csv format. The tweet and price dataset together are used for Multivariate Time Series tasks.