TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Hierarchical Learning for Generation with Long Source Sequ...

Hierarchical Learning for Generation with Long Source Sequences

Tobias Rohde, Xiaoxia Wu, Yinhan Liu

2021-04-15Reading ComprehensionMachine TranslationText SummarizationDocument SummarizationDocument Level Machine TranslationTranslationDocument TranslationGeneral Classification
PaperPDF

Abstract

One of the challenges for current sequence to sequence (seq2seq) models is processing long sequences, such as those in summarization and document level machine translation tasks. These tasks require the model to reason at the token level as well as the sentence and paragraph level. We design and study a new Hierarchical Attention Transformer-based architecture (HAT) that outperforms standard Transformers on several sequence to sequence tasks. Furthermore, our model achieves state-of-the-art ROUGE scores on four summarization tasks, including PubMed, arXiv, CNN/DM, SAMSum, and AMI. Our model outperforms document-level machine translation baseline on the WMT20 English to German translation task. We investigate what the hierarchical layers learn by visualizing the hierarchical encoder-decoder attention. Finally, we study hierarchical learning on encoder-only pre-training and analyze its performance on classification tasks.

Results

TaskDatasetMetricValueModel
Reading ComprehensionRACEAccuracy67.3HAT (Encoder)
Text SummarizationSAMSumROUGE-153.01HAT-CNNDM
Text SummarizationSAMSumROUGE-228.27HAT-CNNDM
Text SummarizationSAMSumROUGE-L48.84HAT-CNNDM RL
Text SummarizationArxiv HEP-TH citation graphROUGE-146.74HAT-BART
Text SummarizationArxiv HEP-TH citation graphROUGE-219.19HAT-BART
Text SummarizationArxiv HEP-TH citation graphROUGE-L42.2HAT-BART
Text SummarizationAMIROUGE-152.27HAT-CNNDM
Text SummarizationAMIROUGE-220.15HAT-CNNDM
Text SummarizationAMIROUGE-L50.57HAT-CNNDM
Text SummarizationPubmedROUGE-148.25HAT-BART
Text SummarizationPubmedROUGE-221.35HAT-BART
Text SummarizationPubmedROUGE-L36.69HAT-BART
Text SummarizationX-SumROUGE-145.92HAT-BART
Text SummarizationX-SumROUGE-222.79HAT-BART
Text SummarizationCNN / Daily MailROUGE-144.48HAT-BART
Text SummarizationCNN / Daily MailROUGE-221.31HAT-BART
Text SummarizationCNN / Daily MailROUGE-L41.52HAT-BART
Document SummarizationCNN / Daily MailROUGE-144.48HAT-BART
Document SummarizationCNN / Daily MailROUGE-221.31HAT-BART
Document SummarizationCNN / Daily MailROUGE-L41.52HAT-BART

Related Papers

A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification2025-07-15Function-to-Style Guidance of LLMs for Code Translation2025-07-15Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09Unconditional Diffusion for Generative Sequential Recommendation2025-07-08GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation2025-07-04DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy2025-07-02