TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

Francesco Barbieri, Jose Camacho-Collados, Leonardo Neves, Luis Espinosa-Anke

2020-10-23Findings of the Association for Computational Linguistics 2020Sentiment Analysis General Classification Classification Language Modelling

Paper PDF Code(official)Code

Abstract

The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it is unclear what the current state of the art is, as there is no standardized evaluation protocol, neither a strong set of baselines trained on such domain-specific data. In this paper, we propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks. We also provide a strong set of baselines as starting point, and compare different language modeling pre-training strategies. Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.

Results

Task	Dataset	Metric	Value	Model
Sentiment Analysis	TweetEval	ALL	61.3	RoBERTa-Base
Sentiment Analysis	TweetEval	Emoji	30.9	RoBERTa-Base
Sentiment Analysis	TweetEval	Emotion	76.1	RoBERTa-Base
Sentiment Analysis	TweetEval	Hate	46.6	RoBERTa-Base
Sentiment Analysis	TweetEval	Irony	59.7	RoBERTa-Base
Sentiment Analysis	TweetEval	Offensive	79.5	RoBERTa-Base
Sentiment Analysis	TweetEval	Sentiment	71.3	RoBERTa-Base
Sentiment Analysis	TweetEval	Stance	68	RoBERTa-Base
Sentiment Analysis	TweetEval	ALL	61	RoBERTa-Twitter
Sentiment Analysis	TweetEval	Emoji	29.3	RoBERTa-Twitter
Sentiment Analysis	TweetEval	Emotion	72	RoBERTa-Twitter
Sentiment Analysis	TweetEval	Hate	49.9	RoBERTa-Twitter
Sentiment Analysis	TweetEval	Irony	65.4	RoBERTa-Twitter
Sentiment Analysis	TweetEval	Offensive	77.1	RoBERTa-Twitter
Sentiment Analysis	TweetEval	Sentiment	69.1	RoBERTa-Twitter
Sentiment Analysis	TweetEval	Stance	66.7	RoBERTa-Twitter
Sentiment Analysis	TweetEval	ALL	53.5	SVM
Sentiment Analysis	TweetEval	Emoji	29.3	SVM
Sentiment Analysis	TweetEval	Emotion	64.7	SVM
Sentiment Analysis	TweetEval	Hate	36.7	SVM
Sentiment Analysis	TweetEval	Irony	61.7	SVM
Sentiment Analysis	TweetEval	Offensive	52.3	SVM
Sentiment Analysis	TweetEval	Sentiment	62.9	SVM
Sentiment Analysis	TweetEval	Stance	67.3	SVM
Sentiment Analysis	TweetEval	ALL	58.1	FastText
Sentiment Analysis	TweetEval	Emoji	25.8	FastText
Sentiment Analysis	TweetEval	Emotion	65.2	FastText
Sentiment Analysis	TweetEval	Irony	63.1	FastText
Sentiment Analysis	TweetEval	Offensive	73.4	FastText
Sentiment Analysis	TweetEval	Sentiment	62.9	FastText
Sentiment Analysis	TweetEval	Stance	65.4	FastText
Sentiment Analysis	TweetEval	ALL	56.5	LSTM
Sentiment Analysis	TweetEval	Emoji	24.7	LSTM
Sentiment Analysis	TweetEval	Emotion	66	LSTM
Sentiment Analysis	TweetEval	Hate	52.6	LSTM
Sentiment Analysis	TweetEval	Irony	62.8	LSTM
Sentiment Analysis	TweetEval	Offensive	71.7	LSTM
Sentiment Analysis	TweetEval	Sentiment	58.3	LSTM
Sentiment Analysis	TweetEval	Stance	59.4	LSTM

TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

Abstract

Results

Related Papers

TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

Abstract

Results

Related Papers