TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/TweetEval: Unified Benchmark and Comparative Evaluation fo...

TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

Francesco Barbieri, Jose Camacho-Collados, Leonardo Neves, Luis Espinosa-Anke

2020-10-23Findings of the Association for Computational Linguistics 2020Sentiment AnalysisGeneral ClassificationClassificationLanguage Modelling
PaperPDFCode(official)Code

Abstract

The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it is unclear what the current state of the art is, as there is no standardized evaluation protocol, neither a strong set of baselines trained on such domain-specific data. In this paper, we propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks. We also provide a strong set of baselines as starting point, and compare different language modeling pre-training strategies. Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.

Results

TaskDatasetMetricValueModel
Sentiment AnalysisTweetEvalALL61.3RoBERTa-Base
Sentiment AnalysisTweetEvalEmoji30.9RoBERTa-Base
Sentiment AnalysisTweetEvalEmotion76.1RoBERTa-Base
Sentiment AnalysisTweetEvalHate46.6RoBERTa-Base
Sentiment AnalysisTweetEvalIrony59.7RoBERTa-Base
Sentiment AnalysisTweetEvalOffensive79.5RoBERTa-Base
Sentiment AnalysisTweetEvalSentiment71.3RoBERTa-Base
Sentiment AnalysisTweetEvalStance68RoBERTa-Base
Sentiment AnalysisTweetEvalALL61RoBERTa-Twitter
Sentiment AnalysisTweetEvalEmoji29.3RoBERTa-Twitter
Sentiment AnalysisTweetEvalEmotion72RoBERTa-Twitter
Sentiment AnalysisTweetEvalHate49.9RoBERTa-Twitter
Sentiment AnalysisTweetEvalIrony65.4RoBERTa-Twitter
Sentiment AnalysisTweetEvalOffensive77.1RoBERTa-Twitter
Sentiment AnalysisTweetEvalSentiment69.1RoBERTa-Twitter
Sentiment AnalysisTweetEvalStance66.7RoBERTa-Twitter
Sentiment AnalysisTweetEvalALL53.5SVM
Sentiment AnalysisTweetEvalEmoji29.3SVM
Sentiment AnalysisTweetEvalEmotion64.7SVM
Sentiment AnalysisTweetEvalHate36.7SVM
Sentiment AnalysisTweetEvalIrony61.7SVM
Sentiment AnalysisTweetEvalOffensive52.3SVM
Sentiment AnalysisTweetEvalSentiment62.9SVM
Sentiment AnalysisTweetEvalStance67.3SVM
Sentiment AnalysisTweetEvalALL58.1FastText
Sentiment AnalysisTweetEvalEmoji25.8FastText
Sentiment AnalysisTweetEvalEmotion65.2FastText
Sentiment AnalysisTweetEvalIrony63.1FastText
Sentiment AnalysisTweetEvalOffensive73.4FastText
Sentiment AnalysisTweetEvalSentiment62.9FastText
Sentiment AnalysisTweetEvalStance65.4FastText
Sentiment AnalysisTweetEvalALL56.5LSTM
Sentiment AnalysisTweetEvalEmoji24.7LSTM
Sentiment AnalysisTweetEvalEmotion66LSTM
Sentiment AnalysisTweetEvalHate52.6LSTM
Sentiment AnalysisTweetEvalIrony62.8LSTM
Sentiment AnalysisTweetEvalOffensive71.7LSTM
Sentiment AnalysisTweetEvalSentiment58.3LSTM
Sentiment AnalysisTweetEvalStance59.4LSTM

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16