TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Self-Monitoring Navigation Agent via Auxiliary Progress Es...

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation

Chih-Yao Ma, Jiasen Lu, Zuxuan Wu, Ghassan AlRegib, Zsolt Kira, Richard Socher, Caiming Xiong

2019-01-10ICLR 2019 5Vision-Language NavigationVisual NavigationVision and Language NavigationNatural Language Visual Grounding
PaperPDFCodeCode(official)

Abstract

The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments. This challenging task demands that the agent be aware of which instruction was completed, which instruction is needed next, which way to go, and its navigation progress towards the goal. In this paper, we introduce a self-monitoring agent with two complementary components: (1) visual-textual co-grounding module to locate the instruction completed in the past, the instruction required for the next action, and the next moving direction from surrounding images and (2) progress monitor to ensure the grounded instruction correctly reflects the navigation progress. We test our self-monitoring agent on a standard benchmark and analyze our proposed approach through a series of ablation studies that elucidate the contributions of the primary components. Using our proposed method, we set the new state of the art by a significant margin (8% absolute increase in success rate on the unseen test set). Code is available at https://github.com/chihyaoma/selfmonitoring-agent .

Results

TaskDatasetMetricValueModel
Vision and Language NavigationVLN Challengeerror5.67Self-Monitoring Navigation Agent (no beam search; Progress Inference)
Vision and Language NavigationVLN Challengelength18.04Self-Monitoring Navigation Agent (no beam search; Progress Inference)
Vision and Language NavigationVLN Challengeoracle success0.59Self-Monitoring Navigation Agent (no beam search; Progress Inference)
Vision and Language NavigationVLN Challengespl0.35Self-Monitoring Navigation Agent (no beam search; Progress Inference)
Vision and Language NavigationVLN Challengesuccess0.48Self-Monitoring Navigation Agent (no beam search; Progress Inference)

Related Papers

SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models2025-07-17Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities2025-07-17NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments2025-06-30VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning2025-06-20LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction2025-06-16Grounded Vision-Language Navigation for UAVs with Open-Vocabulary Goal Understanding2025-06-12A Navigation Framework Utilizing Vision-Language Models2025-06-11Generating Vision-Language Navigation Instructions Incorporated Fine-Grained Alignment Annotations2025-06-10