13,477 Hits in 7.0 sec

Quantifying and Alleviating the Language Prior Problem in Visual Question Answering [article]

Yangyang Guo and Zhiyong Cheng and Liqiang Nie and Yibing Liu and Yinglong Wang and Mohan Kankanhalli
2019 arXiv   pre-print
Benefiting from the advancement of computer vision, natural language processing and information retrieval techniques, visual question answering (VQA), which aims to answer questions about an image or a  ...  ., score regularization module) to enhance current VQA models by alleviating the language prior problem as well as boosting the backbone model performance.  ...  language prior effect (called LP score) and design a generalized regularization method to alleviate the language prior problem in VQA.  ... 
arXiv:1905.04877v1 fatcat:ep3be7qhvndu5gloswvodregwa

Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View [article]

Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Qi Tian, Min Zhang
2021 arXiv   pre-print
Recent studies have pointed out that many well-developed Visual Question Answering (VQA) models are heavily affected by the language prior problem, which refers to making predictions based on the co-occurrence  ...  In this paper, we propose to interpret the language prior problem in VQA from a class-imbalance view.  ...  LM, LMH, CSS and CSS+LMH are developed to intentionally alleviate the language prior problem in VQA.  ... 
arXiv:2010.16010v4 fatcat:kkn7ire36fa5rngbmqslgyz3se

Free VQA Models from Knowledge Inertia by Pairwise Inconformity Learning

Yiyi Zhou, Rongrong Ji, Jinsong Su, Xiangming Li, Xiaoshuai Sun
In this paper, we uncover the issue of knowledge inertia in visual question answering (VQA), which commonly exists in most VQA models and forces the models to mainly rely on the question content to "guess  ...  " answer, without regard to the visual information.  ...  Conclusion In this paper, we address the issue of knowledge inertia in Visual Question Answering, which is mainly caused by the strong language priors.  ... 
doi:10.1609/aaai.v33i01.33019316 fatcat:u6qtr2pm5ffanpkcqsjjiaa3fu

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering [article]

Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, Devi Parikh
2017 arXiv   pre-print
We propose to counter these language priors for the task of Visual Question Answering (VQA) and make vision (the V in VQA) matter!  ...  Problems at the intersection of vision and language are of significant importance both as challenging research questions and for the rich set of applications they enable.  ...  The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the  ... 
arXiv:1612.00837v3 fatcat:q7yjwpu4w5c55ekgjejtsr6bfu

Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning" [article]

Saeed Amizadeh, Hamid Palangi, Oleksandr Polozov, Yichen Huang, Kazuhito Koishida
2020 arXiv   pre-print
Visual reasoning tasks such as visual question answering (VQA) require an interplay of visual perception with reasoning about the question semantics grounded in perception.  ...  To address this, we propose (1) a framework to isolate and evaluate the reasoning aspect of VQA separately from its perception, and (2) a novel top-down calibration technique that allows the model to answer  ...  Acknowledgement We would like to thank Pengchuan Zhang for insightful discussions and Drew Hudson for helpful input during her visit at Microsoft Research.  ... 
arXiv:2006.11524v3 fatcat:ohfd3uad2jd6fo4mfgfd26pa5q

WeaQA: Weak Supervision via Captions for Visual Question Answering [article]

Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral
2021 arXiv   pre-print
Methodologies for training visual question answering (VQA) models assume the availability of datasets with human-annotated Image-Question-Answer (I-Q-A) triplets.  ...  Additionally, we demonstrate the efficacy of spatial-pyramid image patches as a simple but effective alternative to dense and costly object bounding box annotations used in existing VQA models.  ...  Acknowledgements The authors acknowledge support from the DARPA SAIL-ON program W911NF2020006, ONR award N00014-20-1-2332, and NSF grant 1816039, and the anonymous reviewers for their insightful discussion  ... 
arXiv:2012.02356v2 fatcat:yoqklfrx2vhctm7u24elycwwsi

AI Student: A Machine Reading Comprehension System for the Korean College Scholastic Ability Test

Gyeongmin Kim, Soomin Lee, Chanjun Park, Jaechoon Jo
2022 Mathematics  
Machine reading comprehension is a question answering mechanism in which a machine reads, understands, and answers questions from a given text.  ...  In this paper, we propose a novel Korean CSAT Question and Answering (KCQA) model and effectively utilize four easy data augmentation strategies with round trip translation to augment the insufficient  ...  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Data Availability Statement: Not applicable.  ... 
doi:10.3390/math10091486 fatcat:3nh6l4jshfcvdgfzoo5v3tmgda

Ask to Understand: Question Generation for Multi-hop Question Answering [article]

Jiawei Li, Mucheng Ren, Yang Gao, Yizhe Yang
2022 arXiv   pre-print
Multi-hop Question Answering (QA) requires the machine to answer complex questions by finding scattering clues and reasoning from multiple documents.  ...  the QG module could generate better sub-questions than QD methods in terms of fluency, consistency, and diversity.  ...  Does it alleviate shortcut problem by adding question generation module?  ... 
arXiv:2203.09073v1 fatcat:3i34k5linfbfxppvv5mu7ub5g4

Multimodal Research in Vision and Language: A Review of Current and Emerging Trends [article]

Shagun Uppal, Sarthak Bhagat, Devamanyu Hazarika, Navonil Majumdar, Soujanya Poria, Roger Zimmermann, Amir Zadeh
2020 arXiv   pre-print
In this paper, we present a detailed overview of the latest trends in research pertaining to visual and language modalities.  ...  More recently, this has enhanced research interests in the intersection of the Vision and Language arena with its numerous applications and fast-paced growth.  ...  As a broader scale application, VQAwCo poses the problem of visual question answering based on a collection of videos or photos [99] .  ... 
arXiv:2010.09522v2 fatcat:l4npstkoqndhzn6hznr7eeys4u

The meaning of "most" for visual question answering models [article]

Alexander Kuhnle, Ann Copestake
2019 arXiv   pre-print
The correct interpretation of quantifier statements in the context of a visual scene requires non-trivial inference mechanisms.  ...  Our aim is to identify what strategy deep learning models for visual question answering learn when trained on such questions.  ...  RELATED WORK Visual question answering (VQA) is the general task of answering questions about visual scenes.  ... 
arXiv:1812.11737v2 fatcat:75k2t7ldjrflxd6glai3rwh2ou

Interventional Video Grounding with Dual Contrastive Learning [article]

Guoshun Nan, Rui Qiao, Yao Xiao, Jun Liu, Sicong Leng, Hao Zhang, Wei Lu
2021 arXiv   pre-print
Existing approaches focus more on the alignment of visual and language stimuli with various likelihood-based matching or regression strategies, i.e., P(Y|X).  ...  Consequently, these models may suffer from spurious correlations between the language and video features due to the selection bias of the dataset. 1) To uncover the causality behind the model and data,  ...  -02), the Agency for Science, Technology and  ... 
arXiv:2106.11013v2 fatcat:orjpyddcfjcyrhaanxx7yqc7ni

From HR to the C-Suite: Speaking the Same Language

Mike Psenka
2013 Employment Relations Today  
Providing ready answers to those questions, and how they affect your program, increases C-suite confidence that significant risks have been addressed.  ...  If HR can convey a detailed plan to alleviate the organization's pain points, the C-suite is more likely to listen and approve.  ... 
doi:10.1002/ert.21399 fatcat:7q4wprwsy5fgdbuknkck4np4w4

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning [article]

Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra
2017 arXiv   pre-print
We introduce the first goal-driven training for visual question answering and dialog agents.  ...  Thus, we demonstrate the emergence of grounded language and communication among 'visual' dialog agents with no human supervision.  ...  Views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S.  ... 
arXiv:1703.06585v2 fatcat:6gmbr5hcgbhurdapclicxci33q

Answer Questions with Right Image Regions: A Visual Attention Regularization Approach [article]

Yibing Liu, Yangyang Guo, Jianhua Yin, Xuemeng Song, Weifeng Liu, Liqiang Nie
2021 arXiv   pre-print
Visual attention in Visual Question Answering (VQA) targets at locating the right image regions regarding the answer prediction, offering a powerful technique to promote multi-modal understanding.  ...  However, recent studies have pointed out that the highlighted image regions from the visual attention are often irrelevant to the given question and answer, leading to model confusion for correct visual  ...  INTRODUCTION With the great progress of natural language processing, computer vision, and multimodal representation learning, Visual Question Answering (VQA) has emerged as a significant interdisciplinary  ... 
arXiv:2102.01916v2 fatcat:tbxv3lq5nve2zm3htyzutxe3oe

Reinventing the Wheel: Explaining Question Duplication in Question Answering Communities

Xiaohui Liu, Yijing Li, Fei Liu, Zhao Cai, Eric T. K. Lim
2019 International Conference on Information Systems  
Results revealed that while the credibility of both questions and answers could alleviate question duplication, visual and actionable elements are more effective in preventing question duplication by boosting  ...  Duplicate questions are common occurrences in Question Answering Communities (QACs) and impede the development of efficacious problem-solving communities.  ...  Acknowledgements Work in this paper was supported by the National Natural Science Foundation of China (NSFC: 71801204)  ... 
dblp:conf/icis/LiuL0CL19 fatcat:d24pmvbi4rbthc4cyynjofud4a
« Previous Showing results 1 — 15 out of 13,477 results