A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering
[article]
2019
arXiv
pre-print
Embodied Question Answering (EQA) is a recently proposed task, where an agent is placed in a rich 3D environment and must act based solely on its egocentric input to answer a given question. The desired outcome is that the agent learns to combine capabilities such as scene understanding, navigation and language understanding in order to perform complex reasoning in the visual world. However, initial advancements combining standard vision and language methods with imitation and reinforcement
arXiv:1908.04950v1
fatcat:v7pqiuv5rjbb7nqna7thv33zp4