Filters








39 Hits in 4.7 sec

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge [article]

Kenneth Marino, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi
2019 arXiv   pre-print
In this paper, we address the task of knowledge-based visual question answering and provide a benchmark, called OK-VQA, where the image content is not sufficient to answer the questions, encouraging methods  ...  Our new dataset includes more than 14,000 questions that require external knowledge to answer.  ...  Our contributions are: (a) we introduce the OK-VQA dataset, which includes only questions that require external resources to answer; (b) we benchmark some state-of-theart VQA models on our new dataset  ... 
arXiv:1906.00067v2 fatcat:h2yumclhjbdtxdge4pop5lm62m

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

Kenneth Marino, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
In this paper, we address the task of knowledge-based visual question answering and provide a benchmark, called OK-VQA, where the image content is not sufficient to answer the questions, encouraging methods  ...  Our new dataset includes more than 14,000 questions that require external knowledge to answer.  ...  Our contributions are: (a) we introduce the OK-VQA dataset, which includes only questions that require external resources to answer; (b) we benchmark some state-of-theart VQA models on our new dataset  ... 
doi:10.1109/cvpr.2019.00331 dblp:conf/cvpr/MarinoRFM19 fatcat:5c4z34e7afafzov3irhjvsayia

Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering [article]

Man Luo, Yankai Zeng, Pratyay Banerjee, Chitta Baral
2021 arXiv   pre-print
Knowledge-based visual question answering (VQA) requires answering questions with external knowledge in addition to the content of images.  ...  One dataset that is mostly used in evaluating knowledge-based VQA is OK-VQA, but it lacks a gold standard knowledge corpus for retrieval.  ...  Although the OK-VQA benchmark encourages a VQA system to rely on external resources to answer the question, it does not provide a knowledge corpus for a QA system to use.  ... 
arXiv:2109.04014v1 fatcat:rnm2ghrosbd4xkctt4jnozfndu

Cross-Modal Knowledge Reasoning for Knowledge-based Visual Question Answering

Jing Yu, Zihao Zhu, Yujing Wang, Weifeng Zhang, Yue Hu, Jianlong Tan
2020 Pattern Recognition  
Knowledge-based Visual Question Answering (KVQA) requires external knowledge beyond the visible content to answer questions about an image.  ...  We achieve a new state-of-the-art performance on three popular benchmark datasets, including FVQA, Visual7W-KB and OK-VQA, and demonstrate the effectiveness and interpretability of our model with extensive  ...  This increases the difficulty to understand the questions accurately. (2) OK-VQA requires a wide range of knowledge beyond a specific knowledge base.  ... 
doi:10.1016/j.patcog.2020.107563 fatcat:ezlkrzacbnddfh7f573vqouhne

Visual Question Answering for Cultural Heritage [article]

Pietro Bongini, Federico Becattini, Andrew D. Bagdanov, Alberto Del Bimbo
2020 arXiv   pre-print
A popular emerging trend in computer vision is Visual Question Answering (VQA), in which users can interact with a neural network by posing questions in natural language and receiving answers about the  ...  Usually this additional knowledge comes both from the artwork itself (and therefore the image depicting it) and from an external source of knowledge, such as an information sheet.  ...  We take from VQA v2 a number of visual questions equal to the number of questions that require external knowledge from OK-VQA. The obtained dataset is then split into train and test sets.  ... 
arXiv:2003.09853v1 fatcat:f64kyw44bnb6xg5bwbpe6rituq

KAT: A Knowledge Augmented Transformer for Vision-and-Language [article]

Liangke Gui, Borui Wang, Qiuyuan Huang, Alex Hauptmann, Yonatan Bisk, Jianfeng Gao
2022 arXiv   pre-print
OK-VQA.  ...  In this work, we ask a different question: Can multimodal transformers leverage explicit knowledge in their reasoning?  ...  Knowledge-based VQA. Some Knowledgebased visual language tasks requires external knowledge beyond the image to answer a question.  ... 
arXiv:2112.08614v2 fatcat:fvqo2qkodrhdjabphbzrfhljqy

MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants [article]

Alkesh Patel, Joel Ruben Antony Moniz, Roman Nguyen, Nick Tzou, Hadas Kotek, Vincent Renkens
2021 arXiv   pre-print
The research in visual question answering (VQA) and visual question generation (VQG) is a great step forward.  ...  However, they do not capture questions that a visually-abled person would ask multimodal assistants. Moreover, many times questions do not seek information from external knowledge.  ...  The questions shown along with images may not require help from digital assistant for a visually-abled person as the answers seem obvious.  ... 
arXiv:2110.06416v2 fatcat:4nofyvydobamdpc7b7tecflcci

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA [article]

Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, Lijuan Wang
2021 arXiv   pre-print
Knowledge-based visual question answering (VQA) involves answering questions that require external knowledge not present in the image.  ...  Existing methods first retrieve knowledge from external resources, then reason over the selected knowledge, the input image, and question for answer prediction.  ...  Introduction The problem of knowledge-based visual question answering (VQA) (Marino et al. 2019 ) extends the standard VQA task (Antol et al. 2015) by asking questions that require outside knowledge  ... 
arXiv:2109.05014v1 fatcat:xdnjehynyjeajg2t4r67sa7uda

Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering [article]

Zihao Zhu, Jing Yu, Yujing Wang, Yajing Sun, Yue Hu, Qi Wu
2020 arXiv   pre-print
Fact-based Visual Question Answering (FVQA) requires external knowledge beyond visible content to answer questions about an image, which is challenging but indispensable to achieve general VQA.  ...  How to capture the question-oriented and information-complementary evidence remains a key challenge to solve the problem.  ...  This indicates that general VQA task like OK-VQA cannot be simply solved by a well-designed model, but requires the ability to incorporate external knowledge in an effective way.  ... 
arXiv:2006.09073v3 fatcat:hithrz4vrzc75d6epbyi6cltza

Generating Natural Questions from Images for Multimodal Assistants [article]

Alkesh Patel, Akanksha Bindal, Hadas Kotek, Christopher Klein, Jason Williams
2020 arXiv   pre-print
Recently published datasets such as KB-VQA, FVQA, and OK-VQA try to collect questions that look for external knowledge which makes them appropriate for multimodal assistants.  ...  The research in visual question answering (VQA) and visual question generation (VQG) is a great step.  ...  with external knowledge.  ... 
arXiv:2012.03678v1 fatcat:xccdkbbkffdctatfijlm2hqvjq

Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking [article]

Dirk Väth, Pascal Tilli, Ngoc Thang Vu
2021 arXiv   pre-print
On the way towards general Visual Question Answering (VQA) systems that are able to answer arbitrary questions, the need arises for evaluation beyond single-metric leaderboards for specific datasets.  ...  Our metrics allow us to quantify which image and question embeddings provide most robustness to a model. All code is publicly available.  ...  (2019) introduce a dataset that requires external knowledge to answer its questions, thereby motivating the integration of additional knowledge pools. Text VQA * Singh et al.  ... 
arXiv:2110.05159v1 fatcat:hhjfdvhpgzbunfgnq3h5qdzjxq

Evaluating State-of-the-Art Visual Question Answering Models Ability to Answer Complex Counting Questions

Krish Gangaraju, Khaled Jedoui
2021 Journal of student research  
Visual Question Answering (VQA) is a relatively newer area of computer science involving computer vision, natural language processing, and deep learning.  ...  Since the original VQA dataset was made publicly available in 2014, we've seen datasets such as the OK-VQA, Visual7W, and CLEVR that have all explored new concepts, various algorithms exceeding previous  ...  OK-VQA augments the idea that to answer challenging questions, external knowledge, apart from the image itself, is required to fulfill this task. 6 Visual Genome is one of the largest datasets allowing  ... 
doi:10.47611/jsrhs.v10i4.2446 fatcat:ncxqcnmfmnctpmuxeex2r2kuvu

KnowIT VQA: Answering Knowledge-Based Questions about Videos

Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
We propose a novel video understanding task by fusing knowledge-based and video question answering.  ...  First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom.  ...  Acknowledgements This work is based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).  ... 
doi:10.1609/aaai.v34i07.6713 fatcat:k7homct2mfhoxg3mhqg2ghsjdi

KnowIT VQA: Answering Knowledge-Based Questions about Videos [article]

Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima
2019 arXiv   pre-print
We propose a novel video understanding task by fusing knowledge-based and video question answering.  ...  First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom.  ...  Acknowledgements This work is based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).  ... 
arXiv:1910.10706v3 fatcat:drp73xjndjde3bcd7pe2bqo3je

K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition [article]

Kohei Uehara, Tatsuya Harada
2022 arXiv   pre-print
Visual Question Generation (VQG) is a task to generate questions from images. When humans ask questions about an image, their goal is often to acquire some new knowledge.  ...  However, existing studies on VQG have mainly addressed question generation from answers or question categories, overlooking the objectives of knowledge acquisition.  ...  The OK-VQA dataset [17] is intended to be a VQA dataset that requires knowledge and is larger than the FVQA dataset (∼10K questions); however, it lacks annotations on "which knowledge is relevant to  ... 
arXiv:2203.07890v1 fatcat:fhfio6bzk5hnfeq2zeqwi62mve
« Previous Showing results 1 — 15 out of 39 results