A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Multi-modal Memory Enhancement Attention Network for Image-Text Matching
2020
IEEE Access
by constructing a Multi-Modal Memory Enhancement (M3E) module. ...
Image-text matching is an attractive research topic in the community of vision and language. ...
CONCLUSION In this paper, we proposed a novel Multi-modal Memory Enhancement Attention Network (M3A-Net) for achieving image-text matching. ...
doi:10.1109/access.2020.2975594
fatcat:ciiubythzzevpkw2ip5csnjwf4
Review of Recent Deep Learning Based Methods for Image-Text Retrieval
2020
2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)
In this paper, we highlight key points of recent cross-modal retrieval approaches based on deep-learning, especially in the image-text retrieval context, and classify them into four categories according ...
Extracting relevant information efficiently from large-scale multi-modal data is becoming a crucial problem of information retrieval. ...
for more effective image-text matching. ...
doi:10.1109/mipr49039.2020.00042
dblp:conf/mipr/ChenZBK20
fatcat:fps5wiw4ezf7teko3vegaxq4tq
Learning Dual Semantic Relations with Graph Attention for Image-Text Matching
2020
IEEE transactions on circuits and systems for video technology (Print)
Image-Text Matching is one major task in cross-modal information processing. The main challenge is to learn the unified visual and textual representations. ...
Thus, a novel multi-level semantic relations enhancement approach named Dual Semantic Relations Attention Network(DSRAN) is proposed which mainly consists of two modules, separate semantic relations module ...
[41] proposed a cross memory network with pair discrimination to capture the common knowledge between image and text modalities. More special mechanisms are used in the global-wise matching. ...
doi:10.1109/tcsvt.2020.3030656
fatcat:ymindb2imnbgnlmitnkziskkmi
LILE: Look In-Depth before Looking Elsewhere – A Dual Attention Network using Transformers for Cross-Modal Information Retrieval in Histopathology Archives
[article]
2022
arXiv
pre-print
Most contemporary works apply cross attention to highlight the essential elements of an image or text in relation to the other modalities and try to match them together. ...
Furthermore, the age of networks that used multiple modalities separately has practically ended. ...
Attention and Gated Memory Blocks After the representation for each modality instance has been extracted, a multi-head selfattention module is applied to obtain m enhanced feature maps for extracted features ...
arXiv:2203.01445v2
fatcat:onogf45adrgcvjd5psnm25sbam
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
[article]
2020
arXiv
pre-print
The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and text. ...
These models go beyond the simple image classifiers in which they can do uni-directional (e.g. image captioning, image generation) and bi-directional (e.g. cross-modal retrieval, visual question answering ...
To compensate for these limitations, word-level attention [53] , hierarchical text-to-image mapping [46] and memory networks [59] have been explored. ...
arXiv:2010.08189v1
fatcat:2l7molbcn5hf3oyhe3l52tdwra
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
2020
Neurocomputing
The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and text. ...
These models go beyond the simple image classifiers in which they can do uni-directional (e.g. image captioning, image generation) and bi-directional (e.g. cross-modal retrieval, visual question answering ...
To compensate for these limitations, word-level attention [53] , hierarchical text-to-image mapping [46] and memory networks [59] have been explored. ...
doi:10.1016/j.neucom.2020.10.042
fatcat:hyjkj5enozfrvgzxy6avtbmoxu
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval
[article]
2020
arXiv
pre-print
In this paper, to address such a deficiency, we propose an Iterative Matching with Recurrent Attention Memory (IMRAM) method, in which correspondences between images and texts are captured with multiple ...
Enabling bi-directional retrieval of images and texts is important for understanding the correspondence between vision and language. ...
Conclusion In this paper, we propose an Iterative Matching method with a Recurrent Attention Memory network (IMRAM) for cross-modal image-text retrieval to handle the complexity of semantics. ...
arXiv:2003.03772v1
fatcat:s2hqfom3ira4blfaxazzaso73a
Learning to Respond with Your Favorite Stickers: A Framework of Unifying Multi-Modality and User Preference in Multi-Turn Dialog
[article]
2020
arXiv
pre-print
Specifically, PESRS first employs a convolutional based sticker image encoder and a self-attention based multi-turn dialog encoder to obtain the representation of stickers and utterances. ...
Then, we model the user preference by using the recently selected stickers as input, and use a key-value memory network to store the preference representation. ...
As for sticker recommendation, existing works such as [42] and apps like Hike or QQ directly match the text typed by the user to the short text tag assigned to each sticker. ...
arXiv:2011.03322v1
fatcat:krkee37danaipbpbeozwfgc644
Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources
[article]
2022
arXiv
pre-print
Our work offers the first step and benchmark for open-domain, content-based, multi-modal fact-checking, and significantly outperforms previous baselines that did not leverage external evidence. ...
To integrate evidence and cues from both modalities, we introduce the concept of 'multi-modal cycle-consistency check'; starting from the image/caption, we gather textual/visual evidence, which will be ...
We also thank Rebecca Weil for helpful advice and feedback. ...
arXiv:2112.00061v3
fatcat:7w5ndinlbjht7b5e7elzyozycy
Learning TFIDF Enhanced Joint Embedding for Recipe-Image Cross-Modal Retrieval Service
2021
IEEE Transactions on Services Computing
We present a Multi-modal Semantics enhanced Joint Embedding approach (MSJE) for learning a common feature space between the two modalities (text and image), with the ultimate goal of providing high-performance ...
Third, we further incorporate TFIDF enhanced category semantics to improve the mapping of image modality and to regulate the similarity loss function during the iterative learning of cross-modal joint ...
Stacked Attention Networks (SAN) [7] : SAN applied a stacked attention network to simultaneously locate ingredient regions in the image and learn multi-modal embedding features between ingredient features ...
doi:10.1109/tsc.2021.3098834
fatcat:p6qstgiejbe53p7gnyl2mrfxce
Fine-Grained Image Generation from Bangla Text Description using Attentional Generative Adversarial Network
[article]
2021
arXiv
pre-print
Considering that, we propose Bangla Attentional Generative Adversarial Network (AttnGAN) that allows intensified, multi-stage processing for high-resolution Bangla text-to-image generation. ...
For the first time, a fine-grained image is generated from Bangla text using attentional GAN. Bangla has achieved 7th position among 100 most spoken languages. ...
Second, a deep attentional multi-modal similarity model is presented for training the generator, which can compute the generated exquisite image-text matching loss. ...
arXiv:2109.11749v1
fatcat:vezzdd6dyzd4lleltsk5ix2ho4
Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features
[article]
2022
arXiv
pre-print
This paper introduces a novel method, called Multi-modAl Text Recognition Network (MATRN), that enables interactions between visual and semantic features for better recognition performances. ...
Based on the spatial encoding, visual and semantic features are enhanced by referring to related features in the other modality. ...
To answer the question, this paper proposes a new STR model, named Multi-modAl Text Recognition Network (MATRN), that enhances visual and semantic features by referring to features in both modalities. ...
arXiv:2111.15263v2
fatcat:sgewvxzf2jfnrah7knkpekrthu
Holistic Multi-modal Memory Network for Movie Question Answering
[article]
2018
arXiv
pre-print
In this paper, we present the Holistic Multi-modal Memory Network (HMMN) framework which fully considers the interactions between different input sources (multi-modal context, question) in each hop. ...
Therefore, the proposed framework effectively integrates multi-modal context, question, and answer information, which leads to more informative context retrieved for question answering. ...
CONCLUSION We presented a Holistic Multi-modal Memory Network framework that learns to answer questions with context from multi-modal data. ...
arXiv:1811.04595v1
fatcat:xlxphlnk4rdspixvveeqftl7pu
Why Do We Click: Visual Impression-aware News Recommendation
[article]
2021
arXiv
pre-print
Besides, existing research pays little attention to the click decision-making process in designing multi-modal modeling modules. ...
To accurately capture users' interests, we propose to model multi-modal features, in addition to the news titles that are widely used in existing works, for news recommendation. ...
., IMRec, for multi-modal news recommendation. ...
arXiv:2109.12651v1
fatcat:pcjk6p7c4rbbrgc2hovl6zyfku
Enterprise Strategic Management From the Perspective of Business Ecosystem Construction Based on Multimodal Emotion Recognition
2022
Frontiers in Psychology
Through the comparative analysis of the accuracy of single-modal and multi-modal ER, the self-attention mechanism is applied in the experiment. ...
Then, two datasets, CMU-MOSI and CMU-MOSEI, are selected to design the scheme for multimodal ER based on self-attention mechanism. ...
For this model, it is only necessary to apply the self-attention mechanism and Bi-GRU to three modalities of text, image and audio. ...
doi:10.3389/fpsyg.2022.857891
pmid:35310264
pmcid:PMC8927019
doaj:82cf2c71b7bf4e4f9bdeda763b6e1939
fatcat:hssh4dpwzbahvpv5vyupuuoxuu
« Previous
Showing results 1 — 15 out of 12,551 results