TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SeqGAN: Sequence Generative Adversarial Nets with Policy G...

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient

Lantao Yu, Wei-Nan Zhang, Jun Wang, Yong Yu

2016-09-18Text GenerationReinforcement Learning
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCode

Abstract

As a new way of training generative models, Generative Adversarial Nets (GAN) that uses a discriminative model to guide the training of the generative model has enjoyed considerable success in generating real-valued data. However, it has limitations when the goal is for generating sequences of discrete tokens. A major reason lies in that the discrete outputs from the generative model make it difficult to pass the gradient update from the discriminative model to the generative model. Also, the discriminative model can only assess a complete sequence, while for a partially generated sequence, it is non-trivial to balance its current score and the future one once the entire sequence has been generated. In this paper, we propose a sequence generation framework, called SeqGAN, to solve the problems. Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update. The RL reward signal comes from the GAN discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search. Extensive experiments on synthetic data and real-world tasks demonstrate significant improvements over strong baselines.

Results

TaskDatasetMetricValueModel
Text GenerationCOCO CaptionsBLEU-20.831SeqGAN
Text GenerationCOCO CaptionsBLEU-30.642SeqGAN
Text GenerationCOCO CaptionsBLEU-40.521SeqGAN
Text GenerationCOCO CaptionsBLEU-50.427SeqGAN
Text GenerationEMNLP2017 WMTBLEU-20.859SeqGAN
Text GenerationEMNLP2017 WMTBLEU-30.6015SeqGAN
Text GenerationEMNLP2017 WMTBLEU-40.4541SeqGAN
Text GenerationEMNLP2017 WMTBLEU-50.4498SeqGAN
Text GenerationChinese PoemsBLEU-20.738SeqGAN

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17