Filters








11,303 Hits in 5.3 sec

Guiding the Long-Short Term Memory Model for Image Caption Generation

Xu Jia, Efstratios Gavves, Basura Fernando, Tinne Tuytelaars
2015 2015 IEEE International Conference on Computer Vision (ICCV)  
In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin gLSTM for short.  ...  In particular, we add semantic information extracted from the image as extra input to each unit of the LSTM block, with the aim of guiding the model towards solutions that are more tightly coupled to the  ...  The authors acknowledge the support of the IWT-SBO project PARIS and the iMinds project HiViz.  ... 
doi:10.1109/iccv.2015.277 dblp:conf/iccv/JiaGFT15 fatcat:zavbil3m6fdvljmtypczbn4exi

Guiding Long-Short Term Memory for Image Caption Generation [article]

Xu Jia and Efstratios Gavves and Basura Fernando and Tinne Tuytelaars
2015 arXiv   pre-print
In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin gLSTM for short.  ...  In particular, we add semantic information extracted from the image as extra input to each unit of the LSTM block, with the aim of guiding the model towards solutions that are more tightly coupled to the  ...  Acknowledgment The authors acknowledge the support of the IWT-SBO project PARIS.  ... 
arXiv:1509.04942v1 fatcat:54hccgjkxberjgjznjmbqxpbku

Image Captioning in Real Time

Ankit Patil, Karishma Saudagar, Atul Maharnawar, Tejas Rangatwan, I. Priyadarshini
2022 Zenodo  
The current development in Deep Learning based Machine Translation and Computer Vision have led to incredible Image Captioning models using advanced techniques like Deep Learning.  ...  This model uses a hybrid CN-NRNN model, where the CNN part of the model system uses the Xception model for transfer learning, and RNNs are widely used in language modeling.  ...  In his work "LONG TERM MEMORY", Sepp Hochreiter describes the Short Term Neural Memory Network algorithm of the Short Term Team (LSTM). LSTMs are both spatially and temporally local.  ... 
doi:10.5281/zenodo.6759892 fatcat:j2zygh7a5nbirnbv4bpplk4jzi

Multimodal Memory Modelling for Video Captioning [article]

Junbo Wang, Wei Wang, Yan Huang, Liang Wang, Tieniu Tan
2016 arXiv   pre-print
First, text representation in the Long Short-Term Memory (LSTM) based text decoder is written into the memory, and the memory contents will be read out to guide an attention to select related visual targets  ...  In this paper, we propose a Multimodal Memory Model (M3) to describe videos, which builds a visual and textual shared memory to model the long-term visual-textual dependency and further guide global visual  ...  That is to say, explicitly introducing memory into video captioning can not only model the long-term visual-textual dependency, but also guide visual attention for better video representation.  ... 
arXiv:1611.05592v1 fatcat:isoxd3hx7vha3f6fgqatnbrwe4

Image Caption Generator Using Deep Learning

A. V. N. Kameswari
2021 International Journal for Research in Applied Science and Engineering Technology  
Keywords: Image Caption Generator, Convolutional Neural Network, Long Short-Term Memory, Bleu score, Flickr_8K  ...  After CNN-LSTM model is defined we give an image file as parameter through command prompt for testing image caption generator and it generates the caption of an image and its accuracy is observed by calculating  ...  ) and Long Short-Term Memory (LSTM) which helps to generate accurate results.  ... 
doi:10.22214/ijraset.2021.38652 fatcat:jbokbbd47zc4nhco7dywnr4lji

Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption

Wei Zhang, Yue Ying, Pan Lu, Hongyuan Zha
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Personalized image caption, a natural extension of the standard image caption task, requires to generate brief image descriptions tailored for users' writing style and traits, and is more practical to  ...  And at the high level, the multimodal encoder integrates target image representations with short-term literal-preference, as well as long-term literal-preference learned from user IDs.  ...  Acknowledgments The authors would like to thank the anonymous reviewers for their valuable suggestions.  ... 
doi:10.1609/aaai.v34i05.6503 fatcat:fmqhiw2o3zdtppmgekcczebvbe

Image Captioning Generator Using CNN and LSTM

M. Pranay Kumar
2022 International Journal for Research in Applied Science and Engineering Technology  
Create captions for a picture This is what we've done in Python-based research in which we applied CNN's deep learning algorithm (Convolutional Neural Networks) and LSTM (Long Short-Term Memory) are two  ...  Abstract: The project's goal is to come up with a caption for an image. Photograph captioning is the process of making a description for an image.  ...  SUNIL BHUTADA, Head of the Department of Information Technology, Sreenidhi Institute of Science and Technology for his support and invaluable time.  ... 
doi:10.22214/ijraset.2022.44502 fatcat:o7cetll5nrggrhy3orp3jfwgfa

Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory

Pengfei Cao, Zhongyi Yang, Liang Sun, Yanchun Liang, Mary Qu Yang, Renchu Guan
2019 Neural Processing Letters  
Using an end-to-end approach, we propose a bidirectional semantic attention-based guiding of long short-term memory (Bag-LSTM) model for image captioning.  ...  Moreover, we exploit bidirectional gLSTM as the caption generator, which is capable of learning long term relations between visual features and semantic information by making use of both historical and  ...  long short-term memory network (Bi-gLSTM).  ... 
doi:10.1007/s11063-018-09973-5 pmid:35035261 pmcid:PMC8758065 fatcat:3dgfjsf54rcwtbo3kiuvasqzie

A sequential guiding network with attention for image captioning [article]

Daouda Sow and Zengchang Qin and Mouhamed Niasse and Tao Wan
2019 arXiv   pre-print
The new model is an extension of the encoder-decoder framework with attention that has an additional guiding long short-term memory (LSTM) and can be trained in an end-to-end manner by using image/descriptions  ...  In this paper, we introduce a sequential guiding network that guides the decoder during word generation.  ...  By modeling the guiding network with a Long Short-Term Memory, the guiding vector can be adjusted at each time step based on the current context and high-level image attributes.  ... 
arXiv:1811.00228v3 fatcat:735nrfjbo5c7pjzl5nxgnp75ei

Two-Tier LSTM Model for Image Caption Generation

Phyu Khaing, University of Computer Studies, May Yu, University of Computer Studies
2021 International Journal of Intelligent Engineering and Systems  
) and long-term memory networks (LSTMs) are essential for some of the biggest breakthroughs.  ...  The proposed model also improves the sentence generation efficiency and can achieve better performance for image caption generation.  ...  May The` Yu, for her supportive and constructive guidance on the planning and development of this research work.  ... 
doi:10.22266/ijies2021.0831.03 fatcat:uaejmyx7urgdphwdopo27a6wiy

M3: Multimodal Memory Modelling for Video Captioning

Junbo Wang, Wei Wang, Yan Huang, Liang Wang, Tieniu Tan
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
However, learning an effective mapping from the visual sequence space to the language space is still a challenging problem due to the long-term multimodal dependency modelling and semantic misalignment  ...  Model (M 3 ) to describe videos, which builds a visual and textual shared memory to model the longterm visual-textual dependency and further guide visual attention on described visual targets to solve  ...  In addition, this work is also supported by grants from NVIDIA and the NVIDIA DGX-1 AI Supercomputer.  ... 
doi:10.1109/cvpr.2018.00784 dblp:conf/cvpr/Wang000T18 fatcat:aam4d7tqprha3ayctodnni7mfu

Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning [article]

Yang Xian, Yingli Tian
2017 arXiv   pre-print
In this paper, a self-guiding multimodal LSTM (sg-LSTM) image captioning model is proposed to handle uncontrolled imbalanced real-world image-sentence dataset.  ...  Descriptions in FlickrNYC dataset vary dramatically ranging from short term-descriptions to long paragraph-descriptions and can describe any visual aspects, or even refer to objects that are not depicted  ...  In the proposed framework, a self-guiding multimodal long short-term memory (sg-LSTM) framework is presented to leverage between two portions of the data: data s (images with shorter length of descriptions  ... 
arXiv:1709.05038v1 fatcat:dh2eze5gcjhmngqo4u6neq5n3u

Boosting Memory with a Persistent Memory Mechanism for Remote Sensing Image Captioning

Kun Fu, Yang Li, Wenkai Zhang, Hongfeng Yu, Xian Sun
2020 Remote Sensing  
However, the Long Short-Term Memory (LSTM) network used in decoders still loses some information in the picture over time when the generated caption is long.  ...  This method can pick up the long-term information missed from the LSTM but useful to the caption generation.  ...  The authors would like to express their sincere appreciation for the reviewers for their helpful comments and suggestions. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/rs12111874 fatcat:fv2ugsbin5aadhackifidyjnji

Image Based Review Text Generation with Emotional Guidance [article]

Xuehui Sun, Zihan Zhou, Yuda Fan
2019 arXiv   pre-print
We made several adjustments to the existing image-captioning model to fit our task, in which we should also take non-image features into consideration.  ...  However, rare researches focus on generating product review texts, which is ubiquitous in the online shopping malls and is crucial for online shopping selection and evaluation.  ...  [7] introduces an extended LSTM model called Guiding Long-Short Term Memory network (gLSTM).  ... 
arXiv:1901.04140v1 fatcat:hti5xdjuova37bit3mxs32ml3a

Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning [article]

Dandan Guo, Ruiying Lu, Bo Chen, Zequn Zeng, Mingyuan Zhou
2021 arXiv   pre-print
To guide the paragraph generation, the learned hierarchical topics and visual features are integrated into the language model, including Long Short-Term Memory (LSTM) and Transformer, and jointly optimized  ...  with a deep topic model to guide the learning of a language model.  ...  architectures for multi-model image captioning.  ... 
arXiv:2105.04143v1 fatcat:64tro6caqbgfzmw4yxzw7rp2t4
« Previous Showing results 1 — 15 out of 11,303 results