TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Vision-Language Navigation with Self-Supervised Auxiliary ...

Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks

Fengda Zhu, Yi Zhu, Xiaojun Chang, Xiaodan Liang

2019-11-18CVPR 2020 6Vision-Language NavigationNavigate
PaperPDF

Abstract

Vision-Language Navigation (VLN) is a task where agents learn to navigate following natural language instructions. The key to this task is to perceive both the visual scene and natural language sequentially. Conventional approaches exploit the vision and language features in cross-modal grounding. However, the VLN task remains challenging, since previous works have neglected the rich semantic information contained in the environment (such as implicit navigation graphs or sub-trajectory semantics). In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to take advantage of the additional training signals derived from the semantic information. The auxiliary tasks have four reasoning objectives: explaining the previous actions, estimating the navigation progress, predicting the next orientation, and evaluating the trajectory consistency. As a result, these additional training signals help the agent to acquire knowledge of semantic representations in order to reason about its activity and build a thorough perception of the environment. Our experiments indicate that auxiliary reasoning tasks improve both the performance of the main task and the model generalizability by a large margin. Empirically, we demonstrate that an agent trained with self-supervised auxiliary reasoning tasks substantially outperforms the previous state-of-the-art method, being the best existing approach on the standard benchmark.

Results

TaskDatasetMetricValueModel
Vision and Language NavigationVLN Challengeerror3.24Self-Supervised Auxiliary Reasoning Tasks (Beam Search)
Vision and Language NavigationVLN Challengelength40.85Self-Supervised Auxiliary Reasoning Tasks (Beam Search)
Vision and Language NavigationVLN Challengeoracle success0.81Self-Supervised Auxiliary Reasoning Tasks (Beam Search)
Vision and Language NavigationVLN Challengespl0.21Self-Supervised Auxiliary Reasoning Tasks (Beam Search)
Vision and Language NavigationVLN Challengesuccess0.71Self-Supervised Auxiliary Reasoning Tasks (Beam Search)
Vision and Language NavigationVLN Challengeerror3.69Self-Supervised Auxiliary Reasoning Tasks (Pre-explore)
Vision and Language NavigationVLN Challengelength10.43Self-Supervised Auxiliary Reasoning Tasks (Pre-explore)
Vision and Language NavigationVLN Challengeoracle success0.75Self-Supervised Auxiliary Reasoning Tasks (Pre-explore)
Vision and Language NavigationVLN Challengespl0.65Self-Supervised Auxiliary Reasoning Tasks (Pre-explore)
Vision and Language NavigationVLN Challengesuccess0.68Self-Supervised Auxiliary Reasoning Tasks (Pre-explore)

Related Papers

SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking2025-07-15Privacy-Preserving Multi-Stage Fall Detection Framework with Semi-supervised Federated Learning and Robotic Vision Confirmation2025-07-14Automating MD simulations for Proteins using Large language Models: NAMD-Agent2025-07-10Graph Learning2025-07-08Visual Hand Gesture Recognition with Deep Learning: A Comprehensive Review of Methods, Datasets, Challenges and Future Research Directions2025-07-06STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking2025-07-04