Filters








9,945 Hits in 3.0 sec

General Greedy De-bias Learning [article]

Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, Qi Tian
2021 arXiv   pre-print
Extensive experiments on image classification, adversarial question answering, and visual question answering demonstrate the effectiveness of our method.  ...  GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.  ...  GQA-OOD is a more challenging dataset for Visual Question Answering.  ... 
arXiv:2112.10572v2 fatcat:m77xghq6wfgntgzfiq3wq67xla

Hadamard Product for Low-rank Bilinear Pooling [article]

Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang
2017 arXiv   pre-print
We propose low-rank bilinear pooling using Hadamard product for an efficient attention mechanism of multimodal learning.  ...  They have been applied in various visual tasks, such as object recognition, segmentation, and visual question-answering, to get state-of-the-art performances taking advantage of the expanded representations  ...  ACKNOWLEDGMENTS The authors would like to thank Patrick Emaase for helpful comments and editing. Also, we are thankful to anonymous reviewers who provided comments to improve this paper.  ... 
arXiv:1610.04325v4 fatcat:xljlamvppvabnk4cwnbjno5gke

VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation [article]

Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong
2017 arXiv   pre-print
We study two applications of the VQS data in this paper: supervised attention for VQA and a novel question-focused semantic segmentation task.  ...  ., semantic segmentation and visual question answering (VQA)) are inherently connected in that they reveal different levels and perspectives of human understandings about the same visual scenes --- and  ...  These phenomena highlight the need for explicit links between the visual and text answers, realized in this work as VQS.  ... 
arXiv:1708.04686v1 fatcat:mjd2ypzatfcbtcyity2weffduy

Greedy Gradient Ensemble for Robust Visual Question Answering [article]

Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, Qi Tian
2021 arXiv   pre-print
Language bias is a critical issue in Visual Question Answering (VQA), where models often exploit dataset biases for the final decision without considering the image information.  ...  We further propose a new de-bias framework, Greedy Gradient Ensemble (GGE), which combines multiple biased models for unbiased base model learning.  ...  The most effective solution so far is ensemble-based, which formulates a question-only branch as explicit modelling for language bias. Ramakrishnan et al.  ... 
arXiv:2107.12651v4 fatcat:fj3yfxqtkjgpfg5gpem5b5jdg4

Working Memory Networks: Augmenting Memory Networks with a Relational Reasoning Module

Juan Pavez, Héctor Allende, Héctor Allende-Cid
2018 Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)  
To do that, models like Memory Networks (MemNNs) have combined external memory storages and attention mechanisms.  ...  Moreover, a simple ensemble of two of our models solves all 20 tasks in the joint version of the benchmark.  ...  Table 2 shows the attention values for visual and textual question answering.  ... 
doi:10.18653/v1/p18-1092 dblp:conf/acl/PavezAA18 fatcat:ecc2tayjljgcrhygazxajpu244

Working Memory Networks: Augmenting Memory Networks with a Relational Reasoning Module [article]

Juan Pavez, Héctor Allende, Héctor Allende-Cid
2018 arXiv   pre-print
To do that, models like Memory Networks (MemNNs) have combined external memory storages and attention mechanisms.  ...  Moreover, a simple ensemble of two of our models solves all 20 tasks in the joint version of the benchmark.  ...  Table 2 shows the attention values for visual and textual question answering.  ... 
arXiv:1805.09354v1 fatcat:qzqlac5tzfbsnbj6o5vq6mtqey

Learning Visual Knowledge Memory Networks for Visual Question Answering

Zhou Su, Chen Zhu, Yinpeng Dong, Dongqi Cai, Yurong Chen, Jianguo Li
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
Visual question answering (VQA) requires joint comprehension of images and natural language questions, where many questions can't be directly or clearly answered from visual content but require reasoning  ...  Second is the mechanism for handling multiple knowledge facts expanding from question and answer pairs.  ...  Acknowledgements Thanks Zhiqiang Shen for the help of preparing some illustrations for our early submissions.  ... 
doi:10.1109/cvpr.2018.00807 dblp:conf/cvpr/SuZDCCL18 fatcat:v2wtlp3ewzcsjgsprj6pcnq7uq

Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog [article]

Zhe Gan, Yu Cheng, Ahmed El Kholy, Linjie Li, Jingjing Liu, Jianfeng Gao
2019 arXiv   pre-print
This paper presents a new model for visual dialog, Recurrent Dual Attention Network (ReDAN), using multi-step reasoning to answer a series of questions about an image.  ...  In each question-answering turn of a dialog, ReDAN infers the answer progressively through multiple reasoning steps.  ...  question answering (VQA) [3, 14, 1] and visual dialog [9, 11, 10] .  ... 
arXiv:1902.00579v2 fatcat:mfoiesgitrdy7fsffciqckqabq

Learning Visual Knowledge Memory Networks for Visual Question Answering [article]

Zhou Su, Chen Zhu, Yinpeng Dong, Dongqi Cai, Yurong Chen, Jianguo Li
2018 arXiv   pre-print
Visual question answering (VQA) requires joint comprehension of images and natural language questions, where many questions can't be directly or clearly answered from visual content but require reasoning  ...  Second is the mechanism for handling multiple knowledge facts expanding from question and answer pairs.  ...  Acknowledgements Thanks Zhiqiang Shen for the help of preparing some illustrations for our early submissions. Group Min area Max area  ... 
arXiv:1806.04860v1 fatcat:iwjys34vfjdwjckig36gg5xqjq

ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering [article]

Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, Ram Nevatia
2016 arXiv   pre-print
We propose a novel attention based deep learning architecture for visual question answering task (VQA).  ...  Given an image and an image related natural language question, VQA generates the natural language answer for the question.  ...  With the attention map m, we can improve the question answering accuracy on various classes of questions for the following reasons: • For counting questions, such as "how many cars in the image?"  ... 
arXiv:1511.05960v2 fatcat:n2tkajusrzeiveovk6ssn6vtye

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization [article]

Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee
2018 arXiv   pre-print
Modern Visual Question Answering (VQA) models have been shown to rely heavily on superficial correlations between question and answer words learned during training such as overwhelmingly reporting the  ...  We show empirically that it can improve performance significantly on a bias-sensitive split of the VQA dataset for multiple base models -- achieving state-of-the-art on this task.  ...  Introduction The task of answering questions about visual content -called Visual Question Answering (VQA)presents a rich set of artificial intelligence challenges spanning computer vision and natural language  ... 
arXiv:1810.03649v2 fatcat:c66zkuhbezbdjd6x22ghdfhtiy

Semi-supervised Visual Feature Integration for Pre-trained Language Models [article]

Lisai Zhang, Qingcai Chen, Dongfang Li, Buzhou Tang
2020 arXiv   pre-print
Integrating visual features has been proved useful for natural language understanding tasks.  ...  Considering that our framework only requires an image database, and no not requires further alignments, it provides an efficient and feasible way for multimodal language learning.  ...  Visual Question Answering The visual question answering research [17] is also related to our work because they have similar usage of image with us: use deep models to reason about linguistic and visual  ... 
arXiv:1912.00336v2 fatcat:zqhqmadyrnerzpso7m4htqw3j4

Cross-Modal Knowledge Reasoning for Knowledge-based Visual Question Answering

Jing Yu, Zihao Zhu, Yujing Wang, Weifeng Zhang, Yue Hu, Jianlong Tan
2020 Pattern Recognition  
Knowledge-based Visual Question Answering (KVQA) requires external knowledge beyond the visible content to answer questions about an image.  ...  On top of these new representations, we re-formulate Knowledge-based Visual Question Answering as a recurrent reasoning process for obtaining complementary evidence from multimodal information.  ...  Related Work Visual Question Answering The Visual Question Answering (VQA) task requires the agent to answer a question in natural language according to the visual content in an image, which demands  ... 
doi:10.1016/j.patcog.2020.107563 fatcat:ezlkrzacbnddfh7f573vqouhne

Benchmarking Deep Learning Models for Classification of Book Covers

Adriano Lucieri, Huzaifa Sabir, Shoaib Ahmed Siddiqui, Syed Tahseen Raza Rizvi, Brian Kenji Iwana, Seiichi Uchida, Andreas Dengel, Sheraz Ahmed
2020 SN Computer Science  
To answer this question, this paper makes a three-fold contribution.  ...  Third, it uses explicit attention mechanisms to identify the regions that the network focused on in order to make the prediction.  ...  We thank all members of the Deep Learning Competence Center at the DFKI for their comments and support.  ... 
doi:10.1007/s42979-020-00132-z fatcat:my6vmfqc7vauzouh7ndqbphlci

Visual Question Reasoning on General Dependency Tree [article]

Qingxing Cao, Xiaodan Liang, Bailing Li, Guanbin Li, Liang Lin
2018 arXiv   pre-print
The collaborative reasoning for understanding each image-question pair is very critical but under-explored for an interpretable Visual Question Answering (VQA) system.  ...  This network comprises of two collaborative modules: i) an adversarial attention module to exploit the local visual evidence for each word parsed from the question; ii) a residual composition module to  ...  Related Works Visual question answering The visual question answering task requires co-reasoning over both image and text to infer the correct answer.  ... 
arXiv:1804.00105v1 fatcat:4xuymhxvbnf4jnt3tg4z372cxm
« Previous Showing results 1 — 15 out of 9,945 results