A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
All-in-One Image-Grounded Conversational Agents
[article]
2020
arXiv
pre-print
As single-task accuracy on individual language and image tasks has improved substantially in the last few years, the long-term goal of a generally skilled agent that can both see and talk becomes more feasible to explore. In this work, we focus on leveraging individual language and image tasks, along with resources that incorporate both vision and language towards that objective. We design an architecture that combines state-of-the-art Transformer and ResNeXt modules fed into a novel attentive
arXiv:1912.12394v2
fatcat:heqeepgwsbbdfgdixum3cwb33y