RMIT University
Browse

Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks

conference contribution
posted on 2024-11-03, 14:42 authored by Fengda Zhu, Yi Zhu, Xiaojun ChangXiaojun Chang, Xiaodan Liang
Vision-Language Navigation (VLN) is a task where an agent learns to navigate following a natural language instruction. The key to this task is to perceive both the visual scene and natural language sequentially. Conventional approaches fully exploit vision and language features in cross-modal grounding. However, the VLN task remains challenging, since previous works have implicitly neglected the rich semantic information contained in environments (such as navigation graphs or sub-trajectory semantics). In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to exploit the additional training signals derived from these semantic information. The auxiliary tasks have four reasoning objectives: explaining the previous actions, evaluating the trajectory consistency, estimating the progress and predict the next direction. As a result, these additional training signals help the agent to acquire knowledge of semantic representations in order to reason about its activities and build a thorough perception of environments. Our experiments demonstrate that auxiliary reasoning tasks improve both the performance of the main task and the model generalizability by a large margin. We further demonstrate empirically that an agent trained with self-supervised auxiliary reasoning tasks substantially outperforms the previous state-of-the-art method, being the best existing approach on the standard benchmark.

Funding

Towards data-efficient future action prediction in the wild

Australian Research Council

Find out more...

History

Start page

10009

End page

10019

Total pages

11

Outlet

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020)

Name of conference

CVPR 2020

Publisher

IEEE

Place published

United States

Start date

2020-06-14

End date

2020-06-19

Language

English

Copyright

© 2020 IEEE.

Former Identifier

2006109339

Esploro creation date

2021-08-28

Usage metrics

    Scholarly Works

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC