A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
The Visual QA Devil in the Details: The Impact of Early Fusion and Batch Norm on CLEVR
[article]
2018
arXiv
pre-print
Visual QA is a pivotal challenge for higher-level reasoning, requiring understanding language, vision, and relationships between many objects in a scene. Although datasets like CLEVR are designed to be unsolvable without such complex relational reasoning, some surprisingly simple feed-forward, "holistic" models have recently shown strong performance on this dataset. These models lack any kind of explicit iterative, symbolic reasoning procedure, which are hypothesized to be necessary for
arXiv:1809.04482v1
fatcat:ptaaaefzjrhwpi7rloa57qzoyi