TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/IQ-Learn: Inverse soft-Q Learning for Imitation

IQ-Learn: Inverse soft-Q Learning for Imitation

Divyansh Garg, Shuvam Chakraborty, Chris Cundy, Jiaming Song, Matthieu Geist, Stefano Ermon

2021-06-23NeurIPS 2021 12Sequential Decision MakingImitation LearningAtari GamesContinuous ControlDecision MakingMuJoCo GamesQ-Learning
PaperPDFCode(official)CodeCodeCodeCode

Abstract

In many sequential decision-making problems (e.g., robotics control, game playing, sequential prediction), human or expert data is available containing useful information about the task. However, imitation learning (IL) from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics. Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence but doesn't utilize any information involving the environment's dynamics. Many existing methods that exploit dynamics information are difficult to train in practice due to an adversarial optimization process over reward and policy approximators or biased, high variance gradient estimators. We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function, implicitly representing both reward and policy. On standard benchmarks, the implicitly learned rewards show a high positive correlation with the ground-truth rewards, illustrating our method can also be used for inverse reinforcement learning (IRL). Our method, Inverse soft-Q learning (IQ-Learn) obtains state-of-the-art results in offline and online imitation learning settings, significantly outperforming existing methods both in the number of required environment interactions and scalability in high-dimensional spaces, often by more than 3x.

Results

TaskDatasetMetricValueModel
Atari GamesAtari 2600 Space InvadersReturn507IQ-Learn
Atari GamesAtari 2600 Beam RiderReturn3025IQ-Learn
Atari GamesAtari 2600 SeaquestReturn2349IQ-Learn
Atari GamesAtari 2600 Q*BertReturn12940IQ-Learn
Video GamesAtari 2600 Space InvadersReturn507IQ-Learn
Video GamesAtari 2600 Beam RiderReturn3025IQ-Learn
Video GamesAtari 2600 SeaquestReturn2349IQ-Learn
Video GamesAtari 2600 Q*BertReturn12940IQ-Learn
MuJoCo GamesWalker2dMean5134IQ-Learn
MuJoCo GamesAntAverage Return4362.9IQ-Learn
MuJoCo GamesHumanoid-v2Return5227.1IQ-Learn

Related Papers

Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion2025-07-18The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner2025-07-17Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)2025-07-17Higher-Order Pattern Unification Modulo Similarity Relations2025-07-17Exploiting Constraint Reasoning to Build Graphical Explanations for Mixed-Integer Linear Programming2025-07-17Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour2025-07-17AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15Acting and Planning with Hierarchical Operational Models on a Mobile Robot: A Study with RAE+UPOM2025-07-15