7,415 Hits in 8.8 sec

MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering [article]

Haiwei Pan, Shuning He, Kejia Zhang, Bo Qu, Chunling Chen, Kun Shi
2021 arXiv   pre-print
Since most current medical VQA models focus on visual content, ignoring the importance of text, this paper proposes a multi-view attention-based model(MuVAM) for medical visual question answering which  ...  Multi-view attention can correlate the question with image and word in order to better analyze the question and get an accurate answer.  ...  If the deep learning model pre-trained on the general VQA dataset is transferred to medical VQA and fine-tuned with a small amount of medical images, the final effect is not satisfactory due to the obvious  ... 
arXiv:2107.03216v1 fatcat:4yqcxspwjrdphl6x22naqvh2aq

Multimodal Research in Vision and Language: A Review of Current and Emerging Trends [article]

Shagun Uppal, Sarthak Bhagat, Devamanyu Hazarika, Navonil Majumdar, Soujanya Poria, Roger Zimmermann, Amir Zadeh
2020 arXiv   pre-print
Deep Learning and its applications have cascaded impactful research and development with a diverse range of modalities present in the real-world data.  ...  In this paper, we present a detailed overview of the latest trends in research pertaining to visual and language modalities.  ...  to answer a wide and deep set of related real-world questions.  ... 
arXiv:2010.09522v2 fatcat:l4npstkoqndhzn6hznr7eeys4u

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Khaled Bayoudh, Raja Knani, Fayçal Hamdaoui, Abdellatif Mtibaa
2021 The Visual Computer  
schemes), multitask learning, multimodal alignment, multimodal transfer learning, and zero-shot learning.  ...  In particular, we summarize six perspectives from the current literature on deep multimodal learning, namely: multimodal data representation, multimodal fusion (i.e., both traditional and deep learning-based  ...  To this end, the agent must first explore its environment, capture visual information, and then answer the question posed.  ... 
doi:10.1007/s00371-021-02166-7 pmid:34131356 pmcid:PMC8192112 fatcat:jojwyc6slnevzk7eaiutlmlgfe

Multiple Meta-model Quantifying for Medical Visual Question Answering [article]

Tuong Do, Binh X. Nguyen, Erman Tjiputra, Minh Tran, Quang D. Tran, Anh Nguyen
2021 arXiv   pre-print
Transfer learning is an important step to extract meaningful features and overcome the data limitation in the medical Visual Question Answering (VQA) task.  ...  However, most of the existing medical VQA methods rely on external data for transfer learning, while the meta-data within the dataset is not fully utilized.  ...  Introduction A medical Visual Question Answering (VQA) system can provide meaningful references for both doctors and patients during the treatment process.  ... 
arXiv:2105.08913v2 fatcat:bm2gfgl2evhldjgahjbpjmexmu

A Review on Explainability in Multimodal Deep Neural Nets

Gargi Joshi, Rahee Walambe, Ketan Kotecha
2021 IEEE Access  
Several multimodal fusion methods employing deep learning models are proposed in the literature.  ...  INDEX TERMS deep multimodal learning, explainable AI, interpretability, survey, trends, vision and language research, XAI.  ...  Hence there is an urgent need to understand the approaches, methods, and techniques for multimodal data fusion and build an integrative framework for developing tools and applications in various disciplines  ... 
doi:10.1109/access.2021.3070212 fatcat:5wtxr4nf7rbshk5zx7lzbtcram

Machine learning for big visual analysis

Jun Yu, Xue Mei, Fatih Porikli, Jason Corso
2018 Machine Vision and Applications  
Acknowledgements The work was supported in part by the NSFC-61622205 and in part the NSFC-61472110.  ...  Deep learning for visual analysis The article entitled "Panchromatic and multi-spectral image fusion for new satellites based on multi-channel deep model" proposed a novel method based on the multi-channel  ...  ., sparse coding has been successfully used for visual object recognition that models human visual system; multitask learning can efficiently achieve neural generative question answering; discriminant  ... 
doi:10.1007/s00138-018-0948-5 fatcat:puwirktcpjg5bdfc4wxvuw77ua

Medical Visual Question Answering: A Survey [article]

Zhihong Lin, Donghao Zhang, Qingyi Tac, Danli Shi, Gholamreza Haffari, Qi Wu, Mingguang He, Zongyuan Ge
2022 arXiv   pre-print
Medical Visual Question Answering~(VQA) is a combination of medical artificial intelligence and popular VQA challenges.  ...  Given a medical image and a clinically relevant question in natural language, the medical VQA system is expected to predict a plausible and convincing answer.  ...  Interpretability and Reliability Interpretability is a long-standing problem of deep learning.  ... 
arXiv:2111.10056v2 fatcat:4dihtqmptbgj5lozrv3lfxqv7q

On the Logical Design of a Prototypical Data Lake System for Biological Resources

Haoyang Che, Yucong Duan
2020 Frontiers in Bioengineering and Biotechnology  
analyses like reporting, modeling, data exploration, knowledge discovery, data visualization, advanced analysis, and machine learning.  ...  As an effective complement to those previous systems, data lakes were devised to store voluminous, varied, and diversely structured or unstructured data in their native formats, for the sake of various  ...  ACKNOWLEDGMENTS Thanks are due to the reviewers for their hard work at reading and proofreading the early version of the manuscript. We thank Sheng Zhang for helping submit the final manuscript.  ... 
doi:10.3389/fbioe.2020.553904 pmid:33117777 pmcid:PMC7552915 fatcat:fpizpjiahrc7tdkze5mkw6syzi

Causal Reasoning Meets Visual Representation Learning: A Prospective Study [article]

Yang Liu, Yushen Wei, Hong Yan, Guanbin Li, Liang Lin
2022 arXiv   pre-print
consensus-building standards for reliable visual representation learning and related real-world applications more efficiently.  ...  In this paper, we conduct a comprehensive review of existing causal reasoning methods for visual representation learning, covering fundamental theories, models, and datasets.  ...  In image object detection, deep learning frameworks for object detection are divided into two types.  ... 
arXiv:2204.12037v6 fatcat:upidzcsgubf2nkm5gieudz6jbu

Data Harmonization for Heterogeneous Datasets: A Systematic Literature Review

Ganesh Kumar, Shuib Basri, Abdullahi Abubakar Imam, Sunder Ali Khowaja, Luiz Fernando Capretz, Abdullateef Oluwagbemiga Balogun
2021 Applied Sciences  
(NLP), machine learning (ML), and deep learning (DL).  ...  The heterogeneity and decentralization of data sources affect data visualization and prediction, thereby influencing analytical results accordingly.  ...  Moreover, efficient, and effective machine learning, deep learning, and NLP techniques for textual data will help with faster training and testing approaches.  ... 
doi:10.3390/app11178275 fatcat:2e5jcmsodrej3fwkxkftiwhrsu


Pan Lu, Lei Ji, Wei Zhang, Nan Duan, Ming Zhou, Jianyong Wang
2018 Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining - KDD '18  
To better utilize semantic knowledge in images, we propose a novel framework to learn visual relation facts for VQA.  ...  Recently, Visual Question Answering (VQA) has emerged as one of the most significant tasks in multimodal learning as it requires understanding both visual and textual modalities.  ...  ACKNOWLEDGMENTS We would like to thank our anonymous reviewers for their constructive feedback and suggestions. This work was supported in part by the National Natural Science  ... 
doi:10.1145/3219819.3220036 dblp:conf/kdd/LuJZDZW18 fatcat:jnqklx52mrgobegy5h7nrkbipq

Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion [article]

Yang Wang
2020 arXiv   pre-print
Throughout this survey, we further indicate that the critical components for this field go to collaboration, adversarial competition and fusion over multi-modal spaces.  ...  Substantial empirical studies are carried out to demonstrate its advantages that are benefited from deep multi-modal methods, which can essentially deepen the fusion from multi-modal deep feature spaces  ...  To answer this questions, we need to resolve two basic questions, namely where and when to collaborate.  ... 
arXiv:2006.08159v1 fatcat:g4467zmutndglmy35n3eyfwxku

Cross-media analysis and reasoning: advances and directions

Yu-xin Peng, Wen-wu Zhu, Yao Zhao, Chang-sheng Xu, Qing-ming Huang, Han-qing Lu, Qing-hua Zheng, Tie-jun Huang, Wen Gao
2017 Frontiers of Information Technology & Electronic Engineering  
To address these issues, we provide an overview as follows: (1) theory and model for cross-media uniform representation; (2) cross-media correlation understanding and deep mining; (3) cross-media knowledge  ...  Cross-media analysis and reasoning is an active research area in computer science, and a promising direction for artificial intelligence.  ...  Acknowledgements The authors would like to thank Peng CUI, Shi-kui WEI, Ji-tao SANG, Shu-hui WANG, Jing LIU, and Bu-yue QIAN for their valuable discussions and assistance.  ... 
doi:10.1631/fitee.1601787 fatcat:dqnizhdlbfhpvodzkhv5nlarxq

Artificial intelligence for clinical decision support in neurology

Mangor Pedersen, Karin Verspoor, Mark Jenkinson, Meng Law, David F Abbott, Graeme D Jackson
2020 Brain Communications  
Despite the clinical promise of artificial intelligence, machine and deep learning algorithms are not a one-size-fits-all solution for all types of clinical data and questions.  ...  In this paper, we provide an overview of the core concepts of artificial intelligence, particularly contemporary deep learning methods, to give clinician and neuroscience researchers an appreciation of  ...  We also acknowledge the facilities, and the scientific and technical assistance of the National Imaging Facility (NIF), an Australian Government National Collaborative Research Infrastructure Strategy  ... 
doi:10.1093/braincomms/fcaa096 pmid:33134913 pmcid:PMC7585692 fatcat:6qqvkuhn3nfzjljaykc3ihvsge

Hierarchical Deep Multi-modal Network for Medical Visual Question Answering [article]

Deepak Gupta, Swati Suman, Asif Ekbal
2020 arXiv   pre-print
Visual Question Answering in Medical domain (VQA-Med) plays an important role in providing medical assistance to the end-users.  ...  To address this issue, we propose a hierarchical deep multi-modal network that analyzes and classifies end-user questions/queries and then incorporates a query-specific approach for answer prediction.  ...  The fundamental concept behind all these attentive models is that for answering a specific question, certain visual areas in an image and certain words in a question provides more information than others  ... 
arXiv:2009.12770v1 fatcat:d2dmtduat5b3bm4ujgyryh474y
« Previous Showing results 1 — 15 out of 7,415 results