TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/RealFormer: Transformer Likes Residual Attention

RealFormer: Transformer Likes Residual Attention

Ruining He, Anirudh Ravula, Bhargav Kanagal, Joshua Ainslie

2020-12-21Findings (ACL) 2021 8Machine TranslationParaphrase IdentificationSentiment AnalysisNatural Language InferenceMasked Language ModelingNatural QuestionsTranslationSemantic Textual SimilarityLinguistic AcceptabilityLanguage Modelling
PaperPDFCodeCodeCodeCode(official)Code

Abstract

Transformer is the backbone of modern NLP models. In this paper, we propose RealFormer, a simple and generic technique to create Residual Attention Layer Transformer networks that significantly outperform the canonical Transformer and its variants (BERT, ETC, etc.) on a wide spectrum of tasks including Masked Language Modeling, GLUE, SQuAD, Neural Machine Translation, WikiHop, HotpotQA, Natural Questions, and OpenKP. We also observe empirically that RealFormer stabilizes training and leads to models with sparser attention. Source code and pre-trained checkpoints for RealFormer can be found at https://github.com/google-research/google-research/tree/master/realformer.

Results

TaskDatasetMetricValueModel
Natural Language InferenceMultiNLIMatched86.28RealFormer
Natural Language InferenceMultiNLIMismatched86.34RealFormer
Semantic Textual SimilaritySTS BenchmarkPearson Correlation0.9011RealFormer
Semantic Textual SimilaritySTS BenchmarkSpearman Correlation0.8988RealFormer
Semantic Textual SimilarityQuora Question PairsAccuracy91.34RealFormer
Semantic Textual SimilarityQuora Question PairsF188.28RealFormer
Sentiment AnalysisSST-2 Binary classificationAccuracy94.04RealFormer
Paraphrase IdentificationQuora Question PairsAccuracy91.34RealFormer
Paraphrase IdentificationQuora Question PairsF188.28RealFormer

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17