A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2023; you can also visit the original URL.
The file type is application/pdf
.
Joint Answering and Explanation for Visual Commonsense Reasoning
[article]
2023
arXiv
pre-print
Visual Commonsense Reasoning (VCR), deemed as one challenging extension of the Visual Question Answering (VQA), endeavors to pursue a more high-level visual comprehension. It is composed of two indispensable processes: question answering over a given image and rationale inference for answer explanation. Over the years, a variety of methods tackling VCR have advanced the performance on the benchmark dataset. Despite significant as these methods are, they often treat the two processes in a
arXiv:2202.12626v2
fatcat:6sloh57og5harom6nlqblvrbye