PreBit: Multimodal dataset for Bitcoin price

Introduced 2022-05-30

This is the dataset accompanying the paper: "PreBit - A multimodal model with Twitter FinBERT embeddings for extreme price movement prediction of Bitcoin"

Zou, Y., & Herremans, D. (2023). PreBit-A multimodal model with Twitter FinBERT embeddings for extreme price movement prediction of Bitcoin. Expert Systems with Applications, 120838.

The dataset contains tweets that have already gone through pre-processing. Tweets in a day have been concatenated to form text slices of 200 word tokens, with 50 overlapping tokens at the start of the next text slice. The date for the tweets have been added. Note for 2019, currently Jan-March tweets are missing from the dataset.

The dataset also contains the Bitcoin price we used to create the labels and the price-volume data based SVM models.

The Github repository for the project can be found here. It's currently in the process of re-organising.