TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Neighbor-view Enhanced Model for Vision and Language Navig...

Neighbor-view Enhanced Model for Vision and Language Navigation

Dong An, Yuankai Qi, Yan Huang, Qi Wu, Liang Wang, Tieniu Tan

2021-07-15NavigateVision and Language Navigation
PaperPDFCode(official)

Abstract

Vision and Language Navigation (VLN) requires an agent to navigate to a target location by following natural language instructions. Most of existing works represent a navigation candidate by the feature of the corresponding single view where the candidate lies in. However, an instruction may mention landmarks out of the single view as references, which might lead to failures of textual-visual matching of existing methods. In this work, we propose a multi-module Neighbor-View Enhanced Model (NvEM) to adaptively incorporate visual contexts from neighbor views for better textual-visual matching. Specifically, our NvEM utilizes a subject module and a reference module to collect contexts from neighbor views. The subject module fuses neighbor views at a global level, and the reference module fuses neighbor objects at a local level. Subjects and references are adaptively determined via attention me'chanisms. Our model also includes an action module to utilize the strong orientation guidance (e.g., "turn left") in instructions. Each module predicts navigation action separately and their weighted sum is used for predicting the final action. Extensive experimental results demonstrate the effectiveness of the proposed method on the R2R and R4R benchmarks against several state-of-the-art navigators, and NvEM even beats some pre-training ones. Our code is available at https://github.com/MarSaKi/NvEM.

Results

TaskDatasetMetricValueModel
Vision and Language NavigationVLN Challengeerror4.37MM2021
Vision and Language NavigationVLN Challengelength12.98MM2021
Vision and Language NavigationVLN Challengeoracle success0.66MM2021
Vision and Language NavigationVLN Challengespl0.54MM2021
Vision and Language NavigationVLN Challengesuccess0.58MM2021

Related Papers

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking2025-07-15Privacy-Preserving Multi-Stage Fall Detection Framework with Semi-supervised Federated Learning and Robotic Vision Confirmation2025-07-14Automating MD simulations for Proteins using Large language Models: NAMD-Agent2025-07-10Graph Learning2025-07-08Visual Hand Gesture Recognition with Deep Learning: A Comprehensive Review of Methods, Datasets, Challenges and Future Research Directions2025-07-06STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking2025-07-04