Improving Conditioning in Context-Aware Sequence to Sequence Models

Xinyi Wang, Jason Weston, Michael Auli, Yacine Jernite

2019-11-21Question Answering Data Augmentation Translation Open-Domain Question Answering

Abstract

Neural sequence to sequence models are well established for applications which can be cast as mapping a single input sequence into a single output sequence. In this work, we focus on cases where generation is conditioned on both a short query and a long context, such as abstractive question answering or document-level translation. We modify the standard sequence-to-sequence approach to make better use of both the query and the context by expanding the conditioning mechanism to intertwine query and context attention. We also introduce a simple and efficient data augmentation method for the proposed model. Experiments on three different tasks show that both changes lead to consistent improvements.

Results

Task	Dataset	Metric	Value	Model
Question Answering	ELI5	Rouge-1	23.32	Multi-Inrerleave
Question Answering	ELI5	Rouge-2	4.79	Multi-Inrerleave
Question Answering	ELI5	Rouge-L	14.63	Multi-Inrerleave
Open-Domain Question Answering	ELI5	Rouge-1	23.32	Multi-Inrerleave
Open-Domain Question Answering	ELI5	Rouge-2	4.79	Multi-Inrerleave
Open-Domain Question Answering	ELI5	Rouge-L	14.63	Multi-Inrerleave

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17 Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17 Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17 Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17 Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17 A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17 Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16