A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering
[article]
2021
arXiv
pre-print
Since most current medical VQA models focus on visual content, ignoring the importance of text, this paper proposes a multi-view attention-based model(MuVAM) for medical visual question answering which ...
Multi-view attention can correlate the question with image and word in order to better analyze the question and get an accurate answer. ...
If the deep learning model pre-trained on the general VQA dataset is transferred to medical VQA and fine-tuned with a small amount of medical images, the final effect is not satisfactory due to the obvious ...
arXiv:2107.03216v1
fatcat:4yqcxspwjrdphl6x22naqvh2aq
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
[article]
2020
arXiv
pre-print
Deep Learning and its applications have cascaded impactful research and development with a diverse range of modalities present in the real-world data. ...
In this paper, we present a detailed overview of the latest trends in research pertaining to visual and language modalities. ...
to answer a wide and deep set of related real-world questions. ...
arXiv:2010.09522v2
fatcat:l4npstkoqndhzn6hznr7eeys4u
A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets
2021
The Visual Computer
schemes), multitask learning, multimodal alignment, multimodal transfer learning, and zero-shot learning. ...
In particular, we summarize six perspectives from the current literature on deep multimodal learning, namely: multimodal data representation, multimodal fusion (i.e., both traditional and deep learning-based ...
To this end, the agent must first explore its environment, capture visual information, and then answer the question posed. ...
doi:10.1007/s00371-021-02166-7
pmid:34131356
pmcid:PMC8192112
fatcat:jojwyc6slnevzk7eaiutlmlgfe
Multiple Meta-model Quantifying for Medical Visual Question Answering
[article]
2021
arXiv
pre-print
Transfer learning is an important step to extract meaningful features and overcome the data limitation in the medical Visual Question Answering (VQA) task. ...
However, most of the existing medical VQA methods rely on external data for transfer learning, while the meta-data within the dataset is not fully utilized. ...
Introduction A medical Visual Question Answering (VQA) system can provide meaningful references for both doctors and patients during the treatment process. ...
arXiv:2105.08913v2
fatcat:bm2gfgl2evhldjgahjbpjmexmu
A Review on Explainability in Multimodal Deep Neural Nets
2021
IEEE Access
Several multimodal fusion methods employing deep learning models are proposed in the literature. ...
INDEX TERMS deep multimodal learning, explainable AI, interpretability, survey, trends, vision and language research, XAI. ...
Hence there is an urgent need to understand the approaches, methods, and techniques for multimodal data fusion and build an integrative framework for developing tools and applications in various disciplines ...
doi:10.1109/access.2021.3070212
fatcat:5wtxr4nf7rbshk5zx7lzbtcram
Machine learning for big visual analysis
2018
Machine Vision and Applications
Acknowledgements The work was supported in part by the NSFC-61622205 and in part the NSFC-61472110. ...
Deep learning for visual analysis The article entitled "Panchromatic and multi-spectral image fusion for new satellites based on multi-channel deep model" proposed a novel method based on the multi-channel ...
., sparse coding has been successfully used for visual object recognition that models human visual system; multitask learning can efficiently achieve neural generative question answering; discriminant ...
doi:10.1007/s00138-018-0948-5
fatcat:puwirktcpjg5bdfc4wxvuw77ua
Medical Visual Question Answering: A Survey
[article]
2022
arXiv
pre-print
Medical Visual Question Answering~(VQA) is a combination of medical artificial intelligence and popular VQA challenges. ...
Given a medical image and a clinically relevant question in natural language, the medical VQA system is expected to predict a plausible and convincing answer. ...
Interpretability and Reliability Interpretability is a long-standing problem of deep learning. ...
arXiv:2111.10056v2
fatcat:4dihtqmptbgj5lozrv3lfxqv7q
On the Logical Design of a Prototypical Data Lake System for Biological Resources
2020
Frontiers in Bioengineering and Biotechnology
analyses like reporting, modeling, data exploration, knowledge discovery, data visualization, advanced analysis, and machine learning. ...
As an effective complement to those previous systems, data lakes were devised to store voluminous, varied, and diversely structured or unstructured data in their native formats, for the sake of various ...
ACKNOWLEDGMENTS Thanks are due to the reviewers for their hard work at reading and proofreading the early version of the manuscript. We thank Sheng Zhang for helping submit the final manuscript. ...
doi:10.3389/fbioe.2020.553904
pmid:33117777
pmcid:PMC7552915
fatcat:fpizpjiahrc7tdkze5mkw6syzi
Causal Reasoning Meets Visual Representation Learning: A Prospective Study
[article]
2022
arXiv
pre-print
consensus-building standards for reliable visual representation learning and related real-world applications more efficiently. ...
In this paper, we conduct a comprehensive review of existing causal reasoning methods for visual representation learning, covering fundamental theories, models, and datasets. ...
In image object detection, deep learning frameworks for object detection are divided into two types. ...
arXiv:2204.12037v6
fatcat:upidzcsgubf2nkm5gieudz6jbu
Data Harmonization for Heterogeneous Datasets: A Systematic Literature Review
2021
Applied Sciences
(NLP), machine learning (ML), and deep learning (DL). ...
The heterogeneity and decentralization of data sources affect data visualization and prediction, thereby influencing analytical results accordingly. ...
Moreover, efficient, and effective machine learning, deep learning, and NLP techniques for textual data will help with faster training and testing approaches. ...
doi:10.3390/app11178275
fatcat:2e5jcmsodrej3fwkxkftiwhrsu
To better utilize semantic knowledge in images, we propose a novel framework to learn visual relation facts for VQA. ...
Recently, Visual Question Answering (VQA) has emerged as one of the most significant tasks in multimodal learning as it requires understanding both visual and textual modalities. ...
ACKNOWLEDGMENTS We would like to thank our anonymous reviewers for their constructive feedback and suggestions. This work was supported in part by the National Natural Science ...
doi:10.1145/3219819.3220036
dblp:conf/kdd/LuJZDZW18
fatcat:jnqklx52mrgobegy5h7nrkbipq
Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion
[article]
2020
arXiv
pre-print
Throughout this survey, we further indicate that the critical components for this field go to collaboration, adversarial competition and fusion over multi-modal spaces. ...
Substantial empirical studies are carried out to demonstrate its advantages that are benefited from deep multi-modal methods, which can essentially deepen the fusion from multi-modal deep feature spaces ...
To answer this questions, we need to resolve two basic questions, namely where and when to collaborate. ...
arXiv:2006.08159v1
fatcat:g4467zmutndglmy35n3eyfwxku
Cross-media analysis and reasoning: advances and directions
2017
Frontiers of Information Technology & Electronic Engineering
To address these issues, we provide an overview as follows: (1) theory and model for cross-media uniform representation; (2) cross-media correlation understanding and deep mining; (3) cross-media knowledge ...
Cross-media analysis and reasoning is an active research area in computer science, and a promising direction for artificial intelligence. ...
Acknowledgements The authors would like to thank Peng CUI, Shi-kui WEI, Ji-tao SANG, Shu-hui WANG, Jing LIU, and Bu-yue QIAN for their valuable discussions and assistance. ...
doi:10.1631/fitee.1601787
fatcat:dqnizhdlbfhpvodzkhv5nlarxq
Artificial intelligence for clinical decision support in neurology
2020
Brain Communications
Despite the clinical promise of artificial intelligence, machine and deep learning algorithms are not a one-size-fits-all solution for all types of clinical data and questions. ...
In this paper, we provide an overview of the core concepts of artificial intelligence, particularly contemporary deep learning methods, to give clinician and neuroscience researchers an appreciation of ...
We also acknowledge the facilities, and the scientific and technical assistance of the National Imaging Facility (NIF), an Australian Government National Collaborative Research Infrastructure Strategy ...
doi:10.1093/braincomms/fcaa096
pmid:33134913
pmcid:PMC7585692
fatcat:6qqvkuhn3nfzjljaykc3ihvsge
Hierarchical Deep Multi-modal Network for Medical Visual Question Answering
[article]
2020
arXiv
pre-print
Visual Question Answering in Medical domain (VQA-Med) plays an important role in providing medical assistance to the end-users. ...
To address this issue, we propose a hierarchical deep multi-modal network that analyzes and classifies end-user questions/queries and then incorporates a query-specific approach for answer prediction. ...
The fundamental concept behind all these attentive models is that for answering a specific question, certain visual areas in an image and certain words in a question provides more information than others ...
arXiv:2009.12770v1
fatcat:d2dmtduat5b3bm4ujgyryh474y
« Previous
Showing results 1 — 15 out of 7,415 results