Filters








38,362 Hits in 6.9 sec

Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information [article]

Seonhoon Kim, Inho Kang, Nojun Kwak
2018 arXiv   pre-print
Inspired by DenseNet, a densely connected convolutional network, we propose a densely-connected co-attentive recurrent neural network, each layer of which uses concatenated information of attentive features  ...  Sentence matching is widely used in various natural language tasks such as natural language inference, paraphrase identification, and question answering.  ...  Visualization on the comparable models We study how the attentive weights flow as layers get deeper in each model using the dense or residual connection.  ... 
arXiv:1805.11360v2 fatcat:q7gksggltzhoxovdea6nmocfdq

Deep learning method for visual question answering in the digital radiology domain

Dainius Gaidamavičius, Tomas Iešmantas
2022 Mathematical Models in Engineering  
This type of area falls into the so-called VQA area – Visual Question Answering.  ...  For the radiology image dataset VQA-2019 Med [1], the new method achieves 84.8 % compared to 82.2 % for other considered feature fusion methods.  ...  models for visual question-answering in the radiology domain that take questions and digital images as input and generate answers.  ... 
doi:10.21595/mme.2022.22737 fatcat:62b3pf3ii5dmrfcafrnxwydxeq

Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA [article]

Hyounghun Kim, Zineng Tang, Mohit Bansal
2020 arXiv   pre-print
to allow easier matching) for answering questions.  ...  Moreover, our model is also comprised of dual-level attention (word/object and frame level), multi-head self/cross-integration for different sources (video and dense captions), and gates which pass more  ...  Acknowledgments We thank the reviewers for their helpful comments.  ... 
arXiv:2005.06409v1 fatcat:xvjyiheaevc5phpruyt4ixil2q

An Improved Attention for Visual Question Answering [article]

Tanzila Rahman, Shih-Han Chou, Leonid Sigal, Giuseppe Carenini
2021 arXiv   pre-print
We consider the problem of Visual Question Answering (VQA).  ...  Attention module generates weighted average for each query.  ...  Visual Question Answering Antol et al.  ... 
arXiv:2011.02164v3 fatcat:xowtihjj35cgpp67rzxdds3n2m

Semantic Sentence Matching with Densely-Connected Recurrent and Co-Attentive Information

Seonhoon Kim, Inho Kang, Nojun Kwak
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Inspired by DenseNet, a densely connected convolutional network, we propose a densely-connected co-attentive recurrent neural network, each layer of which uses concatenated information of attentive features  ...  Sentence matching is widely used in various natural language tasks such as natural language inference, paraphrase identification, and question answering.  ...  In question answering, sentence matching is required to determine the degree of matching 1) between a query and a question for question retrieval, and 2) between a question and an answer for answer selection  ... 
doi:10.1609/aaai.v33i01.33016586 fatcat:mniquud2fbdfpmisfnyxqgzdi4

SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space [article]

Liu Yang
2022 arXiv   pre-print
MR based SeqDialN, on the other hand, recurrently refines the semantic question/history representations through the self-attention stack of Transformer and produces promising results on the visual dialog  ...  For featurization, we use a Dense Symmetric Co-Attention network as a lightweight vison-language joint representation generator to fuse multimodal features (i.e., image and text), yielding better computation  ...  Related Work Attention Mechanism has been widely used to address image caption and visual question answering (VQA) tasks.  ... 
arXiv:2008.00397v2 fatcat:2e5rgo56ynatlktnlsp7s4ravy

Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering [article]

Mingxiao Li, Marie-Francine Moens
2022 arXiv   pre-print
Knowledge-based visual question answering (VQA) is a vision-language task that requires an agent to correctly answer image-related questions using knowledge that is not presented in the given image.  ...  Then, this representation is used to guide a graph attention operator over the spatial-aware image graph. Our model achieves new state-of-the-art accuracy on the KRVQR and FVQA datasets.  ...  a fully connected image graph where nodes represent region features, and iteratively perform question guided inter-and intra-graph attention to answer the question.  ... 
arXiv:2203.02985v1 fatcat:nzovyyeamzhxhiczdeharlrlw4

Dynamic Key-Value Memory Enhanced Multi-Step Graph Reasoning for Knowledge-Based Visual Question Answering

Mingxiao Li, Marie-Francine Moens
2022 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Knowledge-based visual question answering (VQA) is a vision-language task that requires an agent to correctly answer image-related questions using knowledge that is not presented in the given image.  ...  Then, this representation is used to guide a graph attention operator over the spatial-aware image graph. Our model achieves new state-of-the-art accuracy on the KRVQR and FVQA datasets.  ...  a fully connected image graph where nodes represent region features, and iteratively perform question guided inter-and intra-graph attention to answer the question.  ... 
doi:10.1609/aaai.v36i10.21346 fatcat:5mfq3vzq5vbqzpmj4qorih4wmi

A Densely Connected GRU Neural Network Based on Coattention Mechanism for Chinese Rice-Related Question Similarity Matching

Haoriqin Wang, Huaji Zhu, Huarui Wu, Xiaomin Wang, Xiao Han, Tongyu Xu
2021 Agronomy  
In the question-and-answer (Q&A) communities of the "China Agricultural Technology Extension Information Platform", thousands of rice-related Chinese questions are newly added every day.  ...  To alleviate the problem of feature vector size increasing due to dense splicing, an autoencoder was used after dense concatenation.  ...  (a): Visualization of attention weight of rice-related question similarity with layer 1. (b): Visualization of attention of rice-related question similarity weight with layer 3.  ... 
doi:10.3390/agronomy11071307 fatcat:qag5pszb5fh35m5whxvimlrjhu

VD-BERT: A Unified Vision and Dialog Transformer with BERT [article]

Yue Wang, Shafiq Joty, Michael R. Lyu, Irwin King, Caiming Xiong, Steven C.H. Hoi
2020 arXiv   pre-print
Visual dialog is a challenging vision-language task, where a dialog agent needs to answer a series of questions through reasoning on the image content and dialog history.  ...  More crucially, we adapt BERT for the effective fusion of vision and dialog contents via visually grounded training.  ...  Acknowledgements We thank Chien-Sheng Wu, Jiashi Feng, Jiaxin Qi, and our anonymous reviewers for their insightful feedback on our paper.  ... 
arXiv:2004.13278v3 fatcat:g3lgmcetzjainh65jkppzzw4de

CQ-VQA: Visual Question Answering on Categorized Questions [article]

Aakansha Mishra, Ashish Anand, Prithwijit Guha
2020 arXiv   pre-print
This paper proposes CQ-VQA, a novel 2-level hierarchical but end-to-end model to solve the task of visual question answering (VQA).  ...  The first level of CQ-VQA, referred to as question categorizer (QC), classifies questions to reduce the potential answer search space.  ...  For example, studies in [19, 23] have shown that along with question guided attention on image, attention from image to questions allow better information flow and interaction between the two modalities  ... 
arXiv:2002.06800v1 fatcat:mwkyk3djyracxbvzrwvbh4f6ke

Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering [article]

Gao Peng, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven Hoi, Xiaogang Wang, Hongsheng Li
2019 arXiv   pre-print
Learning effective fusion of multi-modality features is at the heart of visual question answering.  ...  We also show that the proposed dynamic intra-modality attention flow conditioned on the other modality can dynamically modulate the intra-modality attention of the target modality, which is vital for multimodality  ...  Flow (DFAF) for visual question answering.  ... 
arXiv:1812.05252v4 fatcat:yuh6ctvafzb6dhlrnq2qqxbzrm

Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data [article]

Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang
2020 arXiv   pre-print
We evaluate the proposed method on the two different tasks for video understanding: Video theme classification (Youtube-8M dataset) and Video Question and Answering (TVQA dataset).  ...  Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for learning video data by discovering these complex structures of the video.  ...  Acknowledgements The authors would like to thank Woo Suk Choi and Chris Hickey for helpful comments and editing.  ... 
arXiv:2001.07613v1 fatcat:ytasc3bvube3zok66pfdhaaw44

Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data

Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
We evaluate the proposed method on the two different tasks for video understanding: Video theme classification (Youtube-8M dataset (Abu-El-Haija et al. 2016)) and Video Question and Answering (TVQA dataset  ...  Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for learning video data by discovering these complex structures of the video.  ...  Acknowledgements The authors would like to thank Woo Suk Choi and Chris Hickey for helpful comments and editing.  ... 
doi:10.1609/aaai.v34i04.5978 fatcat:fafwr2admzejlp3rjgf2pq5uiq

Guest Editorial: Spatio-temporal Feature Learning for Unconstrained Video Analysis

Yahong Han, Liqiang Nie, Fei Wu
2018 Multimedia tools and applications  
., video classification, video summarization, visual attention prediction, and video Multimed Tools Appl (  ...  Previous studies mainly focus on the hand-crafted video descriptors, e.g., STIP, MoSIFT, Dense Trajactory etc.  ...  Visual Question Answering (VQA) is a recent hot topic and the challenge lies in that, in most cases, it requires reasoning over the connecting between visual content and languages.  ... 
doi:10.1007/s11042-018-6341-6 fatcat:wsp2hi2gyfgkra6yr2pvo2wbgy
« Previous Showing results 1 — 15 out of 38,362 results