TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Decision Transformer: Reinforcement Learning via Sequence ...

Decision Transformer: Reinforcement Learning via Sequence Modeling

Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch

2021-06-02NeurIPS 2021 12Reinforcement LearningOffline RLAtari GamesOpenAI GymD4RLLanguage Modellingreinforcement-learning
PaperPDFCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances in language modeling such as GPT-x and BERT. In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling. Unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return. Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.

Results

TaskDatasetMetricValueModel
Atari GamesAtari 2600 PongScore17.1DT
Atari GamesAtari 2600 BreakoutScore267.5DT
Atari GamesAtari 2600 SeaquestScore2.4DT
Atari GamesAtari 2600 Q*BertScore25.1DT
Video GamesAtari 2600 PongScore17.1DT
Video GamesAtari 2600 BreakoutScore267.5DT
Video GamesAtari 2600 SeaquestScore2.4DT
Video GamesAtari 2600 Q*BertScore25.1DT
General Reinforcement LearningD4RLAverage Reward73.5Decision Transformer (DT)
Offline RLD4RLAverage Reward73.5Decision Transformer (DT)
MuJoCo GamesD4RLAverage Reward72.2Decision Transformer (DT)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17