Chi Sun, Xipeng Qiu, Yige Xu, Xuanjing Huang
Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Sentiment Analysis | Yelp Fine-grained classification | Error | 28.62 | BERT_large+ITPT |
| Sentiment Analysis | Yelp Fine-grained classification | Error | 29.42 | BERT_base+ITPT |
| Sentiment Analysis | Yelp Binary classification | Error | 1.81 | BERT_large+ITPT |
| Sentiment Analysis | Yelp Binary classification | Error | 1.92 | BERT_base+ITPT |
| Sentiment Analysis | IMDb | Accuracy | 95.79 | BERT_large+ITPT |
| Sentiment Analysis | IMDb | Accuracy | 95.63 | BERT_base+ITPT |
| Text Classification | Sogou News | Accuracy | 98.07 | BERT-ITPT-FiT |
| Text Classification | TREC-6 | Error | 3.2 | BERT-ITPT-FiT |
| Text Classification | DBpedia | Error | 0.68 | BERT-ITPT-FiT |
| Text Classification | AG News | Error | 4.8 | BERT-ITPT-FiT |
| Text Classification | Yahoo! Answers | Accuracy | 77.62 | BERT-ITPT-FiT |
| Classification | Sogou News | Accuracy | 98.07 | BERT-ITPT-FiT |
| Classification | TREC-6 | Error | 3.2 | BERT-ITPT-FiT |
| Classification | DBpedia | Error | 0.68 | BERT-ITPT-FiT |
| Classification | AG News | Error | 4.8 | BERT-ITPT-FiT |
| Classification | Yahoo! Answers | Accuracy | 77.62 | BERT-ITPT-FiT |