TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/StructBERT: Incorporating Language Structures into Pre-tra...

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Wei Wang, Bin Bi, Ming Yan, Chen Wu, Zuyi Bao, Jiangnan Xia, Liwei Peng, Luo Si

2019-08-13ICLR 2020 1Question AnsweringParaphrase IdentificationSentiment AnalysisNatural Language InferenceNatural Language UnderstandingSemantic Textual SimilarityLinguistic AcceptabilitySentiment ClassificationLanguage Modelling
PaperPDF

Abstract

Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as sentiment classification, natural language inference, semantic textual similarity and question answering. Inspired by the linearization exploration work of Elman [8], we extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively. As a result, the new model is adapted to different levels of language understanding required by downstream tasks. The StructBERT with structural pre-training gives surprisingly good empirical results on a variety of downstream tasks, including pushing the state-of-the-art on the GLUE benchmark to 89.0 (outperforming all published models), the F1 score on SQuAD v1.1 question answering to 93.0, the accuracy on SNLI to 91.7.

Results

TaskDatasetMetricValueModel
Natural Language InferenceWNLIAccuracy89.7StructBERTRoBERTa ensemble
Natural Language InferenceMultiNLIMatched91.1Adv-RoBERTa ensemble
Natural Language InferenceMultiNLIMismatched90.7Adv-RoBERTa ensemble
Semantic Textual SimilaritySTS BenchmarkPearson Correlation0.928StructBERTRoBERTa ensemble
Semantic Textual SimilaritySTS BenchmarkSpearman Correlation0.924StructBERTRoBERTa ensemble
Semantic Textual SimilarityQuora Question PairsAccuracy90.7StructBERTRoBERTa ensemble
Semantic Textual SimilarityQuora Question PairsF174.4StructBERTRoBERTa ensemble
Sentiment AnalysisSST-2 Binary classificationAccuracy97.1StructBERTRoBERTa ensemble
Paraphrase IdentificationQuora Question PairsAccuracy90.7StructBERTRoBERTa ensemble
Paraphrase IdentificationQuora Question PairsF174.4StructBERTRoBERTa ensemble

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17