TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Reinforcement Learning with Latent Flow

Reinforcement Learning with Latent Flow

Wenling Shang, Xiaofei Wang, Aravind Srinivas, Aravind Rajeswaran, Yang Gao, Pieter Abbeel, Michael Laskin

2021-01-06NeurIPS 2021 12Optical Flow EstimationReinforcement LearningAtari GamesContinuous ControlMontezuma's RevengeVideo Classificationreinforcement-learning
PaperPDFCode(official)Code(official)

Abstract

Temporal information is essential to learning effective policies with Reinforcement Learning (RL). However, current state-of-the-art RL algorithms either assume that such information is given as part of the state space or, when learning from pixels, use the simple heuristic of frame-stacking to implicitly capture temporal information present in the image observations. This heuristic is in contrast to the current paradigm in video classification architectures, which utilize explicit encodings of temporal information through methods such as optical flow and two-stream architectures to achieve state-of-the-art performance. Inspired by leading video classification architectures, we introduce the Flow of Latents for Reinforcement Learning (Flare), a network architecture for RL that explicitly encodes temporal information through latent vector differences. We show that Flare (i) recovers optimal performance in state-based RL without explicit access to the state velocity, solely with positional state information, (ii) achieves state-of-the-art performance on pixel-based challenging continuous control tasks within the DeepMind control benchmark suite, namely quadruped walk, hopper hop, finger turn hard, pendulum swing, and walker run, and is the most sample efficient model-free pixel-based RL algorithm, outperforming the prior model-free state-of-the-art by 1.9X and 1.5X on the 500k and 1M step benchmarks, respectively, and (iv), when augmented over rainbow DQN, outperforms this state-of-the-art level baseline on 5 of 8 challenging Atari games at 100M time step benchmark.

Results

TaskDatasetMetricValueModel
Atari GamesAtari 2600 Montezuma's RevengeAverage Return (NoOp)1668Flare
Atari GamesAtari 2600 Montezuma's RevengeAverage Return (NoOp)900Rainbow (tuned)
Video GamesAtari 2600 Montezuma's RevengeAverage Return (NoOp)1668Flare
Video GamesAtari 2600 Montezuma's RevengeAverage Return (NoOp)900Rainbow (tuned)

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17