Filters








461 Hits in 2.8 sec

WeaQA: Weak Supervision via Captions for Visual Question Answering [article]

Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral
<span title="2021-05-28">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Methodologies for training visual question answering (VQA) models assume the availability of datasets with human-annotated Image-Question-Answer (I-Q-A) triplets.  ...  Additionally, we demonstrate the efficacy of spatial-pyramid image patches as a simple but effective alternative to dense and costly object bounding box annotations used in existing VQA models.  ...  Acknowledgements The authors acknowledge support from the DARPA SAIL-ON program W911NF2020006, ONR award N00014-20-1-2332, and NSF grant 1816039, and the anonymous reviewers for their insightful discussion  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2012.02356v2">arXiv:2012.02356v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/yoqklfrx2vhctm7u24elycwwsi">fatcat:yoqklfrx2vhctm7u24elycwwsi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210601181423/https://arxiv.org/pdf/2012.02356v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ad/9e/ad9e0e3b7218b20a8fca4a54fe6e86fd579468a2.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2012.02356v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning [article]

Elad Amrani, Rami Ben-Ari, Daniel Rotman, Alex Bronstein
<span title="2020-12-10">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
: Video Question Answering and Text-To-Video Retrieval.  ...  Recently, self-supervised multimodal methods that combine vision and language were proposed to learn multimodal representations without annotation.  ...  Downstream Tasks Video Visual Question Answering (VQA). The Video VQA task comprises answering questions about videos presented in natural language (Antol et al. 2015) .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2003.03186v3">arXiv:2003.03186v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/p576x72txrhuzgesvvgs7gbsui">fatcat:p576x72txrhuzgesvvgs7gbsui</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201212014846/https://arxiv.org/pdf/2003.03186v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/80/1e/801ef525c9c303ec6fbe46e818743c4ed22e1c2f.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2003.03186v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

MMBERT: Multimodal BERT Pretraining for Improved Medical VQA [article]

Yash Khare, Viraj Bagal, Minesh Mathew, Adithi Devi, U Deva Priyakumar, CV Jawahar
<span title="2021-04-03">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Consequently, it is infeasible to directly employ general domain Visual Question Answering (VQA) models for the medical domain.  ...  Our method involves learning richer medical image and text semantic representations using Masked Language Modeling (MLM) with image features as the pretext task on a large medical image+caption dataset  ...  INTRODUCTION AND RELATED WORK Visual question answering (VQA) on medical images aspires to build models that can answer diagnostically relevant natural language questions asked on medical images.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2104.01394v1">arXiv:2104.01394v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/nfgvcaftzjgslec2fuujidsm5e">fatcat:nfgvcaftzjgslec2fuujidsm5e</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210407001849/https://arxiv.org/pdf/2104.01394v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/a9/98/a998eca33d296db79cba03e2cf0e63dd2462b981.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2104.01394v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Overcoming Language Priors with Self-supervised Learning for Visual Question Answering

Xi Zhu, Zhendong Mao, Chunxiao Liu, Peng Zhang, Bin Wang, Yongdong Zhang
<span title="">2020</span> <i title="International Joint Conferences on Artificial Intelligence Organization"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/vfwwmrihanevtjbbkti2kc3nke" style="color: black;">Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence</a> </i> &nbsp;
Most Visual Question Answering (VQA) models suffer from the language prior problem, which is caused by inherent data biases.  ...  Specifically, VQA models tend to answer questions (e.g., what color is the banana?) based on the high-frequency answers (e.g., yellow) ignoring image contents.  ...  Theoretically, we believe that our work can be a meaningful step in realistic VQA and solving the language bias issue, and this self-supervision can be generalized to other tasks (e.g. image caption) that  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.24963/ijcai.2020/151">doi:10.24963/ijcai.2020/151</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/ijcai/ZhuMLZWZ20.html">dblp:conf/ijcai/ZhuMLZWZ20</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/swzgwy4pqjea5h6fteujmtvv7i">fatcat:swzgwy4pqjea5h6fteujmtvv7i</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201104083723/https://www.ijcai.org/Proceedings/2020/0151.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/53/58/5358b69f8175dfce24f776d8fe8fd520a7121758.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.24963/ijcai.2020/151"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

All You May Need for VQA are Image Captions [article]

Soravit Changpinyo, Doron Kukliansky, Idan Szpektor, Xi Chen, Nan Ding, Radu Soricut
<span title="2022-05-04">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Visual Question Answering (VQA) has benefited from increasingly sophisticated models, but has not enjoyed the same level of engagement in terms of data creation.  ...  In this paper, we propose a method that automatically derives VQA examples at volume, by leveraging the abundance of existing image-caption annotations combined with neural models for textual question  ...  the Conceptual Captions, Nassim Oufattole for his early exploration of question generation, Gal Elidan, Sasha Goldshtein, and Avinatan Hassidim for their useful feedback.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2205.01883v1">arXiv:2205.01883v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/q7v76mbry5asxd5aua2g3z4qsa">fatcat:q7v76mbry5asxd5aua2g3z4qsa</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220506085238/https://arxiv.org/pdf/2205.01883v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/7f/51/7f5170b8ec68629164a98f8dfa1d2cbef5bbe5f5.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2205.01883v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Estimating semantic structure for the VQA answer space [article]

Corentin Kervadec
<span title="2021-04-08">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Since its appearance, Visual Question Answering (VQA, i.e. answering a question posed over an image), has always been treated as a classification problem over a set of predefined answers.  ...  We address this issue by proposing (1) two measures of proximity between VQA classes, and (2) a corresponding loss which takes into account the estimated proximity.  ...  Introduction Visual Question Answering (VQA) is a task which requires to provide a textual answer given a question and an image as input.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2006.05726v2">arXiv:2006.05726v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/l6w2uqnoa5htbcc5gt7takyxyy">fatcat:l6w2uqnoa5htbcc5gt7takyxyy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200623111741/https://arxiv.org/pdf/2006.05726v1.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2006.05726v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

VQA with no questions-answers training [article]

Ben-Zion Vatashsky, Shimon Ullman
<span title="2020-05-26">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Training is performed for the language part and the visual part on their own, but unlike existing schemes, the method does not require any training using images with associated questions and answers.  ...  In addition, it can provide explanations to its answers and suggest alternatives when questions are not grounded in the image.  ...  Acknowledgements: This work was supported by EU Horizon 2020 Framework 785907 and ISF grant 320/16.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1811.08481v2">arXiv:1811.08481v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ck6egxbmsfbi3l4lewuiaiyvwm">fatcat:ck6egxbmsfbi3l4lewuiaiyvwm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200930214651/https://arxiv.org/pdf/1811.08481v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/81/b0/81b0d547245b7f291638869cbc95e29ba9ae15e9.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1811.08481v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

LaTr: Layout-Aware Transformer for Scene-Text VQA [article]

Ali Furkan Biten, Ron Litman, Yusheng Xie, Srikar Appalaraju, R. Manmatha
<span title="2021-12-24">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We propose a novel multimodal architecture for Scene Text Visual Question Answering (STVQA), named Layout-Aware Transformer (LaTr).  ...  We show that applying this pre-training scheme on scanned documents has certain advantages over using natural images, despite the domain gap.  ...  Introduction Scene-Text VQA (STVQA) aims to answer questions by utilizing the scene text in the image.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2112.12494v2">arXiv:2112.12494v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/chdp2ozx5vfmromsdxksjwf63e">fatcat:chdp2ozx5vfmromsdxksjwf63e</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220104174901/https://arxiv.org/pdf/2112.12494v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/3f/43/3f43b4239c6955b4c6647c0801fbbbcdea91a320.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2112.12494v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA [article]

Badri N. Patro, Anupriy, Vinay P. Namboodiri
<span title="2019-11-19">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this paper, we aim to obtain improved attention for a visual question answering (VQA) task. It is challenging to provide supervision for attention.  ...  Visualization of the results also confirms our hypothesis that attention maps improve using this form of supervision.  ...  Related work Visual question answering (VQA) was first proposed by (Malinowski and Fritz, 2014) .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1911.08618v1">arXiv:1911.08618v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/53zu35ucajgwpjyihopxnjvnci">fatcat:53zu35ucajgwpjyihopxnjvnci</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200824064812/https://arxiv.org/pdf/1911.08618v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/da/be/dabe557eaad9e326a5b44c04fa619b2118f4bda5.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1911.08618v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment [article]

Haoyu Song, Li Dong, Wei-Nan Zhang, Ting Liu, Furu Wei
<span title="2022-03-14">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We achieve competitive zero/few-shot results on the visual question answering and visual entailment tasks without introducing any additional pre-training procedure.  ...  We first evaluate CLIP's zero-shot performance on a typical visual question answering task and demonstrate a zero-shot cross-modality transfer capability of CLIP on the visual entailment task.  ...  (No.62076081, No.61772153, and No.61936010), and Natural Science Foundation of Heilongjiang (No.YQ2021F006).  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2203.07190v1">arXiv:2203.07190v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/whf2ljh2mjfa5l4wsbr5dpvktq">fatcat:whf2ljh2mjfa5l4wsbr5dpvktq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220326080950/https://arxiv.org/pdf/2203.07190v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/18/bd/18bd22b1b6091bec3c4b8f51ef97c7f11d7f110e.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2203.07190v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

COIN: Counterfactual Image Generation for VQA Interpretation [article]

Zeyd Boukhers, Timo Hartmann, Jan Jürjens
<span title="2022-01-10">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Due to the significant advancement of Natural Language Processing and Computer Vision-based models, Visual Question Answering (VQA) systems are becoming more intelligent and advanced.  ...  Specifically, the generated image is supposed to have the minimal possible change to the original image and leads the VQA model to give a different answer.  ...  [53] propose a self-supervised learning framework that balances the training data but first, identifies whether a given question-image pair is relevant (i.e., the image contains critical information  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2201.03342v1">arXiv:2201.03342v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/kqalox6s2bfwhiuvh77eu3fb7y">fatcat:kqalox6s2bfwhiuvh77eu3fb7y</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220112041724/https://arxiv.org/pdf/2201.03342v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/40/2f/402f2a8d413d666861cd243b3d51e7c2c011e9e5.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2201.03342v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

A negative case analysis of visual grounding methods for VQA [article]

Robik Shrestha, Kushal Kafle, Christopher Kanan
<span title="2020-04-15">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Existing Visual Question Answering (VQA) methods tend to exploit dataset biases and spurious statistical correlations, instead of producing right answers for the right reasons.  ...  To address this issue, recent bias mitigation methods for VQA propose to incorporate visual cues (e.g., human attention maps) to better ground the VQA models, showcasing impressive gains.  ...  We are grateful to Tyler Hayes for agreeing to review the paper at short notice and suggesting valuable edits and corrections for the paper.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2004.05704v2">arXiv:2004.05704v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/syxdoomp3jbajomqfus7gdgtia">fatcat:syxdoomp3jbajomqfus7gdgtia</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200417221753/https://arxiv.org/pdf/2004.05704v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2004.05704v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To? [article]

Corentin Kervadec
<span title="2021-04-07">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Models for Visual Question Answering (VQA) are notorious for their tendency to rely on dataset biases, as the large and unbalanced diversity of questions and concepts involved and tends to prevent models  ...  We propose the GQA-OOD benchmark designed to overcome these concerns: we measure and compare accuracy over both rare and frequent question-answer pairs, and argue that the former is better suited to the  ...  They tend to answer questions without using the image, and even when they do, they do not always exploit relevant visual regions [10] .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2006.05121v3">arXiv:2006.05121v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/kzi26bk2jzdc3fsfnhqv2f6omm">fatcat:kzi26bk2jzdc3fsfnhqv2f6omm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210409042520/https://arxiv.org/pdf/2006.05121v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/47/51/47518edfa16da76e2097b2239fca123ddd9bb26c.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2006.05121v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Fair-VQA: Fairness-aware Visual Question Answering through Sensitive Attribute Prediction

Sungho Park, Sunhee Hwang, Jongkwang Hong, Hyeran Byun
<span title="">2020</span> <i title="Institute of Electrical and Electronics Engineers (IEEE)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/q7qi7j4ckfac7ehf3mjbso4hne" style="color: black;">IEEE Access</a> </i> &nbsp;
VISUAL QUESTION ANSWERING Encoding informative representation from questions and images is important to improve Visual Question Answering (VQA) performances [15] , [32] .  ...  [27] shows that existing image captioning models generate unfair captions in terms of gender and proposes a fairness-aware image captioning model.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/access.2020.3041503">doi:10.1109/access.2020.3041503</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/2eaturrhdfekvc6nfc5sovkx2u">fatcat:2eaturrhdfekvc6nfc5sovkx2u</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201204151331/https://ieeexplore.ieee.org/ielx7/6287639/6514899/09274341.pdf?tp=&amp;arnumber=9274341&amp;isnumber=6514899&amp;ref=" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/dd/ec/ddecfc5a0646684cb04cc8247c8b3ca73e94c692.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/access.2020.3041503"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> ieee.com </button> </a>

Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA

Badri Patro, Anupriy, Vinay Namboodiri
<span title="2020-04-03">2020</span> <i title="Association for the Advancement of Artificial Intelligence (AAAI)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/wtjcymhabjantmdtuptkk62mlq" style="color: black;">PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE</a> </i> &nbsp;
In this paper, we aim to obtain improved attention for a visual question answering (VQA) task. It is challenging to provide supervision for attention.  ...  Visualization of the results also confirms our hypothesis that attention maps improve using this form of supervision.  ...  Related work Visual question answering (VQA) was first proposed by (Malinowski and Fritz 2014) .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1609/aaai.v34i07.6858">doi:10.1609/aaai.v34i07.6858</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/g4e5ujrthbbldn4qpc3hxm2jpm">fatcat:g4e5ujrthbbldn4qpc3hxm2jpm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201103212015/https://aaai.org/ojs/index.php/AAAI/article/download/6858/6712" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/43/90/439056ed0b03d99e0a4919add7b3b0c9ccd7ed8c.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1609/aaai.v34i07.6858"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 461 results