A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
[article]
2017
arXiv
pre-print
Problems at the intersection of vision and language are of significant importance both as challenging research questions and for the rich set of applications they enable. However, inherent structure in our world and bias in our language tend to be a simpler signal for learning than visual modalities, resulting in models that ignore visual information, leading to an inflated sense of their capability. We propose to counter these language priors for the task of Visual Question Answering (VQA) and
arXiv:1612.00837v3
fatcat:q7yjwpu4w5c55ekgjejtsr6bfu