Wei Wang, Bin Bi, Ming Yan, Chen Wu, Zuyi Bao, Jiangnan Xia, Liwei Peng, Luo Si
Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as sentiment classification, natural language inference, semantic textual similarity and question answering. Inspired by the linearization exploration work of Elman [8], we extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively. As a result, the new model is adapted to different levels of language understanding required by downstream tasks. The StructBERT with structural pre-training gives surprisingly good empirical results on a variety of downstream tasks, including pushing the state-of-the-art on the GLUE benchmark to 89.0 (outperforming all published models), the F1 score on SQuAD v1.1 question answering to 93.0, the accuracy on SNLI to 91.7.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Natural Language Inference | WNLI | Accuracy | 89.7 | StructBERTRoBERTa ensemble |
| Natural Language Inference | MultiNLI | Matched | 91.1 | Adv-RoBERTa ensemble |
| Natural Language Inference | MultiNLI | Mismatched | 90.7 | Adv-RoBERTa ensemble |
| Semantic Textual Similarity | STS Benchmark | Pearson Correlation | 0.928 | StructBERTRoBERTa ensemble |
| Semantic Textual Similarity | STS Benchmark | Spearman Correlation | 0.924 | StructBERTRoBERTa ensemble |
| Semantic Textual Similarity | Quora Question Pairs | Accuracy | 90.7 | StructBERTRoBERTa ensemble |
| Semantic Textual Similarity | Quora Question Pairs | F1 | 74.4 | StructBERTRoBERTa ensemble |
| Sentiment Analysis | SST-2 Binary classification | Accuracy | 97.1 | StructBERTRoBERTa ensemble |
| Paraphrase Identification | Quora Question Pairs | Accuracy | 90.7 | StructBERTRoBERTa ensemble |
| Paraphrase Identification | Quora Question Pairs | F1 | 74.4 | StructBERTRoBERTa ensemble |