TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Meta-Explore: Exploratory Hierarchical Vision-and-Language...

Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding

Minyoung Hwang, Jaeyeon Jeong, Minsoo Kim, Yoonseon Oh, Songhwai Oh

2023-03-07CVPR 2023 1Visual NavigationVision and Language Navigation
PaperPDF

Abstract

The main challenge in vision-and-language navigation (VLN) is how to understand natural-language instructions in an unseen environment. The main limitation of conventional VLN algorithms is that if an action is mistaken, the agent fails to follow the instructions or explores unnecessary regions, leading the agent to an irrecoverable path. To tackle this problem, we propose Meta-Explore, a hierarchical navigation method deploying an exploitation policy to correct misled recent actions. We show that an exploitation policy, which moves the agent toward a well-chosen local goal among unvisited but observable states, outperforms a method which moves the agent to a previously visited state. We also highlight the demand for imagining regretful explorations with semantically meaningful clues. The key to our approach is understanding the object placements around the agent in spectral-domain. Specifically, we present a novel visual representation, called scene object spectrum (SOS), which performs category-wise 2D Fourier transform of detected objects. Combining exploitation policy and SOS features, the agent can correct its path by choosing a promising local goal. We evaluate our method in three VLN benchmarks: R2R, SOON, and REVERIE. Meta-Explore outperforms other baselines and shows significant generalization performance. In addition, local goal search using the proposed spectral-domain SOS features significantly improves the success rate by 17.1% and SPL by 20.6% for the SOON benchmark.

Results

TaskDatasetMetricValueModel
Visual NavigationR2Rspl0.61Meta-Explore
Visual NavigationSOON TestNav-SPL25.8Meta-Explore
Visual NavigationSOON TestSR39.1Meta-Explore

Related Papers

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities2025-07-17NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments2025-06-30LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction2025-06-16Grounded Vision-Language Navigation for UAVs with Open-Vocabulary Goal Understanding2025-06-12A Navigation Framework Utilizing Vision-Language Models2025-06-11Enhancing Safety of Foundation Models for Visual Navigation through Collision Avoidance via Repulsive Estimation2025-06-04Disrupting Vision-Language Model-Driven Navigation Services via Adversarial Object Fusion2025-05-29Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation2025-05-27