TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Calibrating Sequence likelihood Improves Conditional Langu...

Calibrating Sequence likelihood Improves Conditional Language Generation

Yao Zhao, Misha Khalman, Rishabh Joshi, Shashi Narayan, Mohammad Saleh, Peter J. Liu

2022-09-30Question AnsweringData-to-Text GenerationText GenerationAbstractive Text SummarizationText SummarizationQuestion GenerationBlocking
PaperPDF

Abstract

Conditional language models are predominantly trained with maximum likelihood estimation (MLE), giving probability mass to sparsely observed target sequences. While MLE trained models assign high probability to plausible sequences given the context, the model probabilities often do not accurately rank-order generated sequences by quality. This has been empirically observed in beam search decoding as output quality degrading with large beam sizes, and decoding strategies benefiting from heuristics such as length normalization and repetition-blocking. In this work, we introduce sequence likelihood calibration (SLiC) where the likelihood of model generated sequences are calibrated to better align with reference sequences in the model's latent space. With SLiC, decoding heuristics become unnecessary and decoding candidates' quality significantly improves regardless of the decoding method. Furthermore, SLiC shows no sign of diminishing returns with model scale, and presents alternative ways to improve quality with limited training and inference budgets. With SLiC, we exceed or match SOTA results on a wide range of generation tasks spanning abstractive summarization, question generation, abstractive question answering and data-to-text generation, even with modest-sized models.

Results

TaskDatasetMetricValueModel
Text SummarizationReddit TIFUROUGE-132.03PEGASUS 2B + SLiC
Text SummarizationReddit TIFUROUGE-211.13PEGASUS 2B + SLiC
Text SummarizationReddit TIFUROUGE-L25.51PEGASUS 2B + SLiC
Text SummarizationSAMSumROUGE-154.37PEGASUS 2B + SliC
Text SummarizationSAMSumROUGE-229.88PEGASUS 2B + SliC
Text SummarizationSAMSumROUGE-L45.89PEGASUS 2B + SliC
Text SummarizationCNN / Daily MailROUGE-147.36Pegasus
Text SummarizationCNN / Daily MailROUGE-224.02Pegasus
Text SummarizationCNN / Daily MailROUGE-L44.45Pegasus
Abstractive Text SummarizationCNN / Daily MailROUGE-147.36Pegasus
Abstractive Text SummarizationCNN / Daily MailROUGE-224.02Pegasus
Abstractive Text SummarizationCNN / Daily MailROUGE-L44.45Pegasus

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility2025-07-16Mitigating Object Hallucinations via Sentence-Level Early Intervention2025-07-16