Adversarial Self-Attention for Language Understanding

Hongqiu Wu, Ruixue Ding, Hai Zhao, Pengjun Xie, Fei Huang, Min Zhang

2022-06-25Paraphrase Identification Sentiment Analysis Natural Language Inference Semantic Similarity Semantic Textual Similarity Named Entity Recognition (NER)Machine Reading Comprehension

Paper PDF Code(official)

Abstract

Deep neural models (e.g. Transformer) naturally learn spurious features, which create a ``shortcut'' between the labels and inputs, thus impairing the generalization and robustness. This paper advances the self-attention mechanism to its robust variant for Transformer-based pre-trained language models (e.g. BERT). We propose \textit{Adversarial Self-Attention} mechanism (ASA), which adversarially biases the attentions to effectively suppress the model reliance on features (e.g. specific keywords) and encourage its exploration of broader semantics. We conduct a comprehensive evaluation across a wide range of tasks for both pre-training and fine-tuning stages. For pre-training, ASA unfolds remarkable performance gains compared to naive training for longer steps. For fine-tuning, ASA-empowered models outweigh naive models by a large margin considering both generalization and robustness.

Results

Task	Dataset	Metric	Value	Model
Reading Comprehension	DREAM	Accuracy	69.2	ASA + RoBERTa
Reading Comprehension	DREAM	Accuracy	64.3	ASA + BERT-base
Visual Question Answering (VQA)	DREAM	Accuracy	69.2	ASA + RoBERTa
Visual Question Answering (VQA)	DREAM	Accuracy	64.3	ASA + BERT-base
Natural Language Inference	MultiNLI	Matched	88	ASA + RoBERTa
Natural Language Inference	MultiNLI	Matched	85	ASA + BERT-base
Semantic Textual Similarity	STS Benchmark	Spearman Correlation	0.892	ASA + RoBERTa
Semantic Textual Similarity	STS Benchmark	Spearman Correlation	0.865	ASA + BERT-base
Semantic Textual Similarity	Quora Question Pairs	F1	73.7	ASA + RoBERTa
Semantic Textual Similarity	Quora Question Pairs	F1	72.3	ASA + BERT-base
Sentiment Analysis	SST-2 Binary classification	Accuracy	96.3	ASA + RoBERTa
Sentiment Analysis	SST-2 Binary classification	Accuracy	94.1	ASA + BERT-base
Named Entity Recognition (NER)	WNUT 2017	F1	57.3	ASA + RoBERTa
Named Entity Recognition (NER)	WNUT 2017	F1	49.8	ASA + BERT-base
Paraphrase Identification	Quora Question Pairs	F1	73.7	ASA + RoBERTa
Paraphrase Identification	Quora Question Pairs	F1	72.3	ASA + BERT-base

Adversarial Self-Attention for Language Understanding

Abstract

Results

Related Papers

Adversarial Self-Attention for Language Understanding

Abstract

Results

Related Papers