A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Contrastive Instruction-Trajectory Learning for Vision-Language Navigation
[article]
2021
arXiv
pre-print
The vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction. Previous works learn to navigate step-by-step following an instruction. However, these works may fail to discriminate the similarities and discrepancies across instruction-trajectory pairs and ignore the temporal continuity of sub-instructions. These problems hinder agents from learning distinctive vision-and-language representations, harming the robustness and
arXiv:2112.04138v2
fatcat:3scvi3vzfvbcpglcfxrgcfbkcy