A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2023; you can also visit the original URL.
The file type is application/pdf
.
Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering
[article]
2023
arXiv
pre-print
Existing visual question answering methods tend to capture the cross-modal spurious correlations, and fail to discover the true causal mechanism that facilitates reasoning truthfully based on the dominant visual evidence and the question intention. Additionally, the existing methods usually ignore the cross-modal event-level understanding that requires to jointly model event temporality, causality, and dynamics. In this work, we focus on event-level visual question answering from a new
arXiv:2207.12647v4
fatcat:plmxuyoskvc5fdz5qxqzxis2am