The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation

Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
I know I came from there. Where should I go next? My estimated confidence decreased. Something went wrong. Let's learn this lesson and go back. Instruction: Exit the room. Walk past the display case and into the kitchen. Stop by the table. 20% 13% 25% 42% 60% 75% 90% 1 st step 1 st step 2 nd 5 th 5 th step 4 th 6 th 7 th Figure 1 : Vision-and-Language Navigation task and our proposed regretful navigation agent. The agent leverages the selfmonitoring mechanism [13] through time to decide when to
more » ... e to decide when to roll back to a previous location and resume the instructionfollowing task. Our code is available at https://github.com/chihyaoma/regretful-agent. Abstract As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making. Specifically, the Vision and Language Navigation (VLN) task involves navigating to a goal purely from language instructions and visual information without explicit knowledge of the goal. Recent successful approaches have made in-roads in achieving good success rates for this task but rely on beam search, which thoroughly explores a large number of trajectories and is unrealistic for applications such as robotics. In this paper, inspired by the intuition of viewing the problem as search on a navigation graph, we propose to use a progress monitor developed in prior work as a learnable heuristic for search. We then propose two modules incorporated into an end-to-end architecture: 1) A learned mechanism to perform backtracking, which decides whether to continue moving forward or roll back to a previous state (Regret Module) and 2) A mechanism to help the agent decide which direction to go next by showing directions that are visited and their associated progress estimate (Progress * Work partially done while the author was a research intern at Salesforce Research. Marker). Combined, the proposed approach significantly outperforms current state-of-the-art methods using greedy action selection, with 5% absolute improvement on the test server in success rates, and more importantly 8% on success rates normalized by the path length.
doi:10.1109/cvpr.2019.00689 dblp:conf/cvpr/MaWAXK19 fatcat:kzntkxo3evempnsqryuvytdaz4