BERTweet: A pre-trained language model for English Tweets

Dat Quoc Nguyen, Thanh Vu, Anh Tuan Nguyen

2020-05-20EMNLP 2020 11Text Classification Sentiment Analysis Part-Of-Speech Tagging named-entity-recognition Named Entity Recognition XLM-R text-classification Named Entity Recognition (NER)Language Modelling

Paper PDF Code Code Code(official)

Abstract

We present BERTweet, the first public large-scale pre-trained language model for English Tweets. Our BERTweet, having the same architecture as BERT-base (Devlin et al., 2019), is trained using the RoBERTa pre-training procedure (Liu et al., 2019). Experiments show that BERTweet outperforms strong baselines RoBERTa-base and XLM-R-base (Conneau et al., 2020), producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks: Part-of-speech tagging, Named-entity recognition and text classification. We release BERTweet under the MIT License to facilitate future research and applications on Tweet data. Our BERTweet is available at https://github.com/VinAIResearch/BERTweet

Results

Task	Dataset	Metric	Value	Model
Part-Of-Speech Tagging	Ritter	Acc	90.1	BERTweet
Part-Of-Speech Tagging	Tweebank	Acc	95.2	BERTweet
Sentiment Analysis	TweetEval	ALL	67.9	BERTweet
Sentiment Analysis	TweetEval	Emoji	33.4	BERTweet
Sentiment Analysis	TweetEval	Emotion	79.3	BERTweet
Sentiment Analysis	TweetEval	Irony	82.1	BERTweet
Sentiment Analysis	TweetEval	Offensive	79.5	BERTweet
Sentiment Analysis	TweetEval	Sentiment	73.4	BERTweet
Sentiment Analysis	TweetEval	Stance	71.2	BERTweet
Named Entity Recognition (NER)	WNUT 2017	F1	56.5	BERTweet
Named Entity Recognition (NER)	WNUT 2016	F1	52.1	BERTweet

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21 Making Language Model a Hierarchical Classifier and Generator2025-07-17 AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17 Assay2Mol: large language model-based drug design using BioAssay context2025-07-16 Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16