A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
VQA-LOL: Visual Question Answering under the Lens of Logic
[article]
2020
arXiv
pre-print
When put under this Lens of Logic, state-of-the-art VQA models have difficulty in correctly answering these logically composed questions. ...
In this paper, we investigate whether visual question answering (VQA) systems trained to answer a question about an image, are able to answer the logical composition of multiple such questions. ...
When put under this Lens of Logic, state-of-the-art VQA models have difficulty in correctly answering these logically composed questions. ...
arXiv:2002.08325v2
fatcat:dft3d4x7cjccdk4luspbcuf7ga
A Closer Look at the Robustness of Vision-and-Language Pre-trained Models
[article]
2021
arXiv
pre-print
Visual Content Manipulation; and (iv) Answer Distribution Shift. ...
To investigate, we conduct a host of thorough evaluations on existing pre-trained models over 4 different types of V+L specific model robustness: (i) Linguistic Variation; (ii) Logical Reasoning; (iii) ...
It consists of two datasets: VQA-LOL Compose (logical combinations of multiple closed binary questions about the same image in VQA v2) and VQA-LOL Supplement (logical combinations of additional questions ...
arXiv:2012.08673v2
fatcat:orl3dt3r3fg3xjac2rt4xwqxxu
WeaQA: Weak Supervision via Captions for Visual Question Answering
[article]
2021
arXiv
pre-print
Methodologies for training visual question answering (VQA) models assume the availability of datasets with human-annotated Image-Question-Answer (I-Q-A) triplets. ...
Our experiments on three VQA benchmarks demonstrate the efficacy of this weakly-supervised approach, especially on the VQA-CP challenge, which tests performance under changing linguistic priors. ...
Acknowledgements The authors acknowledge support from the DARPA SAIL-ON program W911NF2020006, ONR award N00014-20-1-2332, and NSF grant 1816039, and the anonymous reviewers for their insightful discussion ...
arXiv:2012.02356v2
fatcat:yoqklfrx2vhctm7u24elycwwsi
MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering
[article]
2020
arXiv
pre-print
While progress has been made on the visual question answering leaderboards, models often utilize spurious correlations and priors in datasets under the i.i.d. setting. ...
Under this paradigm, models utilize a consistency-constrained training objective to understand the effect of semantic changes in input (question-image pair) on the output (answer). ...
Acknowledgements The authors acknowledge support from the NSF Robust Intelligence Program project #1816039, the DARPA KAIROS program (LESTAT project), the DARPA SAIL-ON program, and ONR award N00014-20 ...
arXiv:2009.08566v2
fatcat:hpbd4nm5pzh3zc6gmxudcnylaa
Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering
[article]
2021
arXiv
pre-print
Knowledge-based visual question answering (VQA) requires answering questions with external knowledge in addition to the content of images. ...
The visual retriever aims to retrieve relevant knowledge, and the visual reader seeks to predict answers based on given knowledge. ...
Acknowledgements The authors acknowledge support from the NSF grant 1816039, DARPA grant W911NF2020006, DARPA grant FA875019C0003, and ONR award N00014-20-1-2332; and thank the reviewers for their feedback ...
arXiv:2109.04014v1
fatcat:rnm2ghrosbd4xkctt4jnozfndu
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
[article]
2021
arXiv
pre-print
Answering semantically-complicated questions according to an image is challenging in Visual Question Answering (VQA) task. ...
Firstly, it not only builds graph for the image, but also constructs graph for the question in terms of both syntactic and embedding information. ...
Yang, “Vqa-lol: Visual 2019.
question answering under the lens of logic,” in European Conference [51] X. Chen, H. Fang, T. Y. Lin, R. Vedantam, S. Gupta, P. ...
arXiv:2112.07270v1
fatcat:oco2bjv4rrfpjfylwcmxa2pfky
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning
[article]
2020
arXiv
pre-print
Additionally we explore the use of open-ended video-based commonsense question answering (V2C-QA) as a way to enrich our captions. ...
Thus for video understanding, such as when captioning videos or when answering questions about videos, one must have an understanding of these commonsense aspects. ...
ZF, TG, YY thank the organizers and the participants of the Telluride Neuromorphic Cognition Workshop, especially the Machine Common Sense (MCS) group. ...
arXiv:2003.05162v3
fatcat:xgri7zaajjejhmujw5crlxmnti
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
2021
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
unpublished
Vqa-lol: Visual question an-
garet Mitchell, Dhruv Batra, C Lawrence Zitnick, swering under the lens of logic. In Proceedings of
and Devi Parikh. 2015. ...
Self-supervised vqa: Answering v in vqa matter: Elevating the role of image under-
visual questions using images and captions. arXiv standing in visual question answering. ...
doi:10.18653/v1/2021.emnlp-main.512
fatcat:ip333delvzhgbgicuibho7wiju
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning
2020
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
unpublished
Additionally we explore the use of open-ended video-based commonsense question answering (V2C-QA) as a way to enrich our captions. ...
Towards ai-complete question answering: A set of prerequisite toy tasks. arXiv preprint arXiv:1502.05698. . 2017. Video question answering via gradually refined attention over appearance and motion. ...
ZF, TG, YY thank the organizers and the participants of the Telluride Neuromorphic Cognition Workshop, especially the Machine Common Sense (MCS) group. ...
doi:10.18653/v1/2020.emnlp-main.61
fatcat:lrtywfat25ejbmct72jmlkxane