TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Adversarial Self-Attention for Language Understanding

Adversarial Self-Attention for Language Understanding

Hongqiu Wu, Ruixue Ding, Hai Zhao, Pengjun Xie, Fei Huang, Min Zhang

2022-06-25Paraphrase IdentificationSentiment AnalysisNatural Language InferenceSemantic SimilaritySemantic Textual SimilarityNamed Entity Recognition (NER)Machine Reading Comprehension
PaperPDFCode(official)

Abstract

Deep neural models (e.g. Transformer) naturally learn spurious features, which create a ``shortcut'' between the labels and inputs, thus impairing the generalization and robustness. This paper advances the self-attention mechanism to its robust variant for Transformer-based pre-trained language models (e.g. BERT). We propose \textit{Adversarial Self-Attention} mechanism (ASA), which adversarially biases the attentions to effectively suppress the model reliance on features (e.g. specific keywords) and encourage its exploration of broader semantics. We conduct a comprehensive evaluation across a wide range of tasks for both pre-training and fine-tuning stages. For pre-training, ASA unfolds remarkable performance gains compared to naive training for longer steps. For fine-tuning, ASA-empowered models outweigh naive models by a large margin considering both generalization and robustness.

Results

TaskDatasetMetricValueModel
Reading ComprehensionDREAMAccuracy69.2ASA + RoBERTa
Reading ComprehensionDREAMAccuracy64.3ASA + BERT-base
Visual Question Answering (VQA)DREAMAccuracy69.2ASA + RoBERTa
Visual Question Answering (VQA)DREAMAccuracy64.3ASA + BERT-base
Natural Language InferenceMultiNLIMatched88ASA + RoBERTa
Natural Language InferenceMultiNLIMatched85ASA + BERT-base
Semantic Textual SimilaritySTS BenchmarkSpearman Correlation0.892ASA + RoBERTa
Semantic Textual SimilaritySTS BenchmarkSpearman Correlation0.865ASA + BERT-base
Semantic Textual SimilarityQuora Question PairsF173.7ASA + RoBERTa
Semantic Textual SimilarityQuora Question PairsF172.3ASA + BERT-base
Sentiment AnalysisSST-2 Binary classificationAccuracy96.3ASA + RoBERTa
Sentiment AnalysisSST-2 Binary classificationAccuracy94.1ASA + BERT-base
Named Entity Recognition (NER)WNUT 2017F157.3ASA + RoBERTa
Named Entity Recognition (NER)WNUT 2017F149.8ASA + BERT-base
Paraphrase IdentificationQuora Question PairsF173.7ASA + RoBERTa
Paraphrase IdentificationQuora Question PairsF172.3ASA + BERT-base

Related Papers

AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles2025-07-15DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification2025-07-15SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning2025-07-14GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation2025-07-10DS@GT at CheckThat! 2025: Evaluating Context and Tokenization Strategies for Numerical Fact Verification2025-07-08