Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation

Jiannan Xiang, Xin Eric Wang, William Yang Wang

2020-09-28Findings of the Association for Computational Linguistics 2020Vision-Language Navigation Navigate Vision and Language Navigation

Paper PDF

Abstract

Vision-and-Language Navigation (VLN) is a natural language grounding task where an agent learns to follow language instructions and navigate to specified destinations in real-world environments. A key challenge is to recognize and stop at the correct location, especially for complicated outdoor environments. Existing methods treat the STOP action equally as other actions, which results in undesirable behaviors that the agent often fails to stop at the destination even though it might be on the right path. Therefore, we propose Learning to Stop (L2Stop), a simple yet effective policy module that differentiates STOP and other actions. Our approach achieves the new state of the art on a challenging urban VLN dataset Touchdown, outperforming the baseline by 6.89% (absolute improvement) on Success weighted by Edit Distance (SED).

Results

Task	Dataset	Metric	Value	Model
Vision and Language Navigation	Touchdown Dataset	Task Completion (TC)	16.68	ARC + L2STOP
Vision and Language Navigation	Touchdown Dataset	Task Completion (TC)	14.13	ARC

Related Papers

SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models2025-07-17 Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities2025-07-17 Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16 CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking2025-07-15 Privacy-Preserving Multi-Stage Fall Detection Framework with Semi-supervised Federated Learning and Robotic Vision Confirmation2025-07-14 Automating MD simulations for Proteins using Large language Models: NAMD-Agent2025-07-10 Graph Learning2025-07-08 Visual Hand Gesture Recognition with Deep Learning: A Comprehensive Review of Methods, Datasets, Challenges and Future Research Directions2025-07-06