TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning to Navigate Unseen Environments: Back Translation...

Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout

Hao Tan, Licheng Yu, Mohit Bansal

2019-04-08NAACL 2019 6Vision-Language NavigationReinforcement LearningNavigateTranslation
PaperPDFCode(official)

Abstract

A grand goal in AI is to build a robot that can accurately navigate based on natural language instructions, which requires the agent to perceive the scene, understand and ground language, and act in the real-world environment. One key challenge here is to learn to navigate in new environments that are unseen during training. Most of the existing approaches perform dramatically worse in unseen environments as compared to seen ones. In this paper, we present a generalizable navigational agent. Our agent is trained in two stages. The first stage is training via mixed imitation and reinforcement learning, combining the benefits from both off-policy and on-policy optimization. The second stage is fine-tuning via newly-introduced 'unseen' triplets (environment, path, instruction). To generate these unseen triplets, we propose a simple but effective 'environmental dropout' method to mimic unseen environments, which overcomes the problem of limited seen environment variability. Next, we apply semi-supervised learning (via back-translation) on these dropped-out environments to generate new paths and instructions. Empirically, we show that our agent is substantially better at generalizability when fine-tuned with these triplets, outperforming the state-of-art approaches by a large margin on the private unseen test set of the Room-to-Room task, and achieving the top rank on the leaderboard.

Results

TaskDatasetMetricValueModel
Vision-Language NavigationRoom2Roomspl0.61R2R+EnvDrop
Vision and Language NavigationVLN Challengeerror3.26null
Vision and Language NavigationVLN Challengelength686.82null
Vision and Language NavigationVLN Challengeoracle success0.99null
Vision and Language NavigationVLN Challengespl0.01null
Vision and Language NavigationVLN Challengesuccess0.69null
Vision and Language NavigationVLN Challengespl0.01null
Vision and Language NavigationVLN Challengeerror5.23Back Translation with Environmental Dropout (no beam search)
Vision and Language NavigationVLN Challengelength11.66Back Translation with Environmental Dropout (no beam search)
Vision and Language NavigationVLN Challengeoracle success0.59Back Translation with Environmental Dropout (no beam search)
Vision and Language NavigationVLN Challengespl0.47Back Translation with Environmental Dropout (no beam search)
Vision and Language NavigationVLN Challengesuccess0.51Back Translation with Environmental Dropout (no beam search)

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17