A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Vision–Language–Knowledge Co-Embedding for Visual Commonsense Reasoning
2021
Sensors
Visual commonsense reasoning is an intelligent task performed to decide the most appropriate answer to a question while providing the rationale or reason for the answer when an image, a natural language question, and candidate responses are given. For effective visual commonsense reasoning, both the knowledge acquisition problem and the multimodal alignment problem need to be solved. Therefore, we propose a novel Vision–Language–Knowledge Co-embedding (ViLaKC) model that extracts knowledge
doi:10.3390/s21092911
pmid:33919196
fatcat:ffoa2x6cindp3masddynz2qdmu