42,555 Hits in 9.8 sec

Where to put the image in an image caption generator

2018 Natural Language Engineering  
AbstractWhen a recurrent neural network (RNN) language model is used for caption generation, the image information can be fed to the neural network either by directly incorporating it in the RNN – conditioning  ...  Our results suggest that the visual and linguistic modalities for caption generation need not be jointly encoded by the RNN as that yields large, memory-intensive models with few tangible advantages in  ...  Scholarships are part-financed by the European Union -European Social Fund (ESF) -Operational Programme II Cohesion Policy 2014-2020 Investing in human capital to create more opportunities and promote  ... 
doi:10.1017/s1351324918000098 fatcat:qi2sp6tys5bjrf4zepcvnuz33e

Image Caption Generation With Adaptive Transformer

Wei Zhang, Wenbo Nie, Xinle Li, Yao Yu
2019 2019 34rd Youth Academic Annual Conference of Chinese Association of Automation (YAC)  
Secondly we combine the spatial attention and adaptive attention into Transformer, which makes decoder to determine where and when to use image region information.  ...  Encoder-decoder framework based image caption has made promising progress. The application of various attention mechanisms has also greatly improved the performance of the caption model.  ...  In image caption task, the encoder should extract the image feature to obtain a context vector, then put it into decoder to generate the language description.  ... 
doi:10.1109/yac.2019.8787715 fatcat:x3v7vilksvhobgpvdfrovj4psi

An Advanced Image Captioning using combination of CNN and LSTM

Priyanka Raut, Et. al.
2021 Turkish Journal of Computer and Mathematics Education  
The Captioning of Image now a days is gaining a lot of interest which generates an automated simple and short sentence describing the image content.  ...  Machines indeed are trained in a way that they can understand the Image content and generate captions which are almost accurate at a human level of knowledge is a very tedious and interesting task.  ...  In this task, an machine is feed with an input image and based on the intelligence and training given, the model generates a simple caption which indeed explains the content of the image in a human readable  ... 
doi:10.17762/turcomat.v12i1s.1593 fatcat:sb2nzh7wy5eg3fnz7ogxq34ohy

Learning Object Context for Dense Captioning

Xiangyang Li, Shuqiang Jiang, Jungong Han
in the image.  ...  Dense captioning is a challenging task which not only detects visual elements in images but also generates natural language sentences to describe them.  ...  Acknowledgements This work was supported in part by the National Natural Science Foundation of China under Grant 61532018, in part by the Lenovo Outstanding Young Scientists Program, in part 8656 by National  ... 
doi:10.1609/aaai.v33i01.33018650 fatcat:pla6mugwcfaovjuukqpnzhckca

Audio Assistant Based Image Captioning System Using RLSTM and CNN

D Akash Reddy, T. Venkat Raju, V. Shashank
2022 International Journal for Research in Applied Science and Engineering Technology  
To vanquish this situation, we developed an audio-based image captioner that will identify the objects in an image and form a meaningful sentence that gives the output in the aural form.  ...  We used NLP (Natural Language Processing) to understand the description of an imageand convert the text to speech.  ...  This research was made possible under the guidance, support, and motivation provided by our faculty, who have our esteem to pursue our interests in the field of image processing.  ... 
doi:10.22214/ijraset.2022.44289 fatcat:xg6oqawmezfe3aikws5iknt3lu

Multimodal feature fusion based on object relation for video captioning

Zhiwen Yan, Ying Chen, Jinlong Song, Jia Zhu
2022 CAAI Transactions on Intelligence Technology  
Video captioning aims at automatically generating a natural language caption to describe the content of a video.  ...  However, most of the existing methods in the video captioning task ignore the relationship between objects in the video and the correlation between multimodal features, and they also ignore the effect  ...  ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation of China under Grant 62077015 and the Key Laboratory of Intelligent Education Technology and Application of Zhejiang Province  ... 
doi:10.1049/cit2.12071 fatcat:bmyvu6sr6zbqtac6jxfqiygutu

SPASS: Scientific Prominence Active Search System with Deep Image Captioning Network [article]

Dicong Qiu
2018 arXiv   pre-print
the auto-generated captions to the prespecified search tasks by certain metrics so as to prioritize those images for transmission.  ...  Scientists can prespecify such search tasks in natural language and upload them to a rover, on which the deployed system constantly captions captured images with a deep image captioning network and compare  ...  Acknowledgement The work described in this paper was carried out at NASA's Jet Propulsion Laboratory, California Institute of Technology.  ... 
arXiv:1809.03385v1 fatcat:s4nf4bwphnaznoy3gn6wmwe5oy

Can Neural Image Captioning be Controlled via Forced Attention? [article]

Philipp Sadler, Tatjana Scheffler, David Schlangen
2019 arXiv   pre-print
Specifically, we take a standard neural image captioning model that uses attention, and fix the attention to pre-determined areas in the image.  ...  We evaluate whether the resulting output is more likely to mention the class of the object in that area than the normally generated caption.  ...  An important detail is that no value is actually zero. The model is still allowed to include image aspects outside the boxes for the caption generation.  ... 
arXiv:1911.03936v1 fatcat:bzvowacyozg2tiozgq3i5iez4e

Social Image Captioning: Exploring Visual Attention and User Attention

Leiquan Wang, Xiaoliang Chu, Weishan Zhang, Yiwei Wei, Weichen Sun, Chunlei Wu
2018 Sensors  
The user-contributed tags, which could reflect the user attention, have been neglected in conventional image captioning.  ...  Most existing image captioning models cannot be applied directly to social image captioning.  ...  [13] proposed an adaptive attention model, which is able to decide when and where to attend to the image. Park et al.  ... 
doi:10.3390/s18020646 pmid:29470409 pmcid:PMC5855536 fatcat:yxar6kllbrcqtcqo3vfnwnvu34

A Thorough Review on Recent Deep Learning Methodologies for Image Captioning [article]

Ahmed Elhagry, Karima Kadaoui
2021 arXiv   pre-print
Image Captioning is a task that combines computer vision and natural language processing, where it aims to generate descriptive legends for images.  ...  This review paper serves as a roadmap for researchers to keep up to date with the latest contributions made in the field of image caption generation.  ...  In [18] , the paper put the spotlight on some of the advancement on the image captioning task until early 2020, where various approaches were discussed including N-cut, color-based segmentation and hybrid  ... 
arXiv:2107.13114v1 fatcat:47ae4rfytne3nithiai4jtm6zy

Reconciling Image Captioning and User's Comments for Urban Tourism

Yazid Bounab, Mourad Oussalah, Ahlam Ferdenache
2020 2020 Tenth International Conference on Image Processing Theory, Tools and Applications (IPTA)  
In the era of digital tourism, this offers a valuable framework to reconcile the widely available tourism images and user's generated content.  ...  Image captioning as a process of assigning textual description to an image has gained momentum nowadays thanks to recent advances in deep learning related architectures and availability of associated tools  ...  First, it extracts the visual content of an image then passes it to some pretrained language model to generate caption [12] , [13] .  ... 
doi:10.1109/ipta50016.2020.9286602 fatcat:hbdlk7noevck5lpygsxmxd7fmi

Attention-based CNN-GRU Model For Automatic Medical Images Captioning: ImageCLEF 2021

Djamila Romaissa Beddiar, Mourad Oussalah, Tapio Seppänen
2021 Conference and Labs of the Evaluation Forum  
We addressed the challenge of medical image captioning by combining a CNN encoder model with an attention-based GRU language generator model whereas a multi-label CNN classifier is used for the concept  ...  The action of understanding and interpretation of medical images is a very important task in the medical diagnosis generation.  ...  Acknowledgments This work is supported by the Academy of Finland Profi5 DigiHealth project (#326291), which is gratefully acknowledged.  ... 
dblp:conf/clef/Beddiar0S21 fatcat:q2a5vbkefneaxfcoap6bglexky

Multi-view pedestrian captioning with an attention topic CNN model

Quan Liu, Yingying Chen, Jinqiao Wang, Sijiong Zhang
2018 Computers in industry (Print)  
Therefore, in this paper, we propose a novel approach to generate multi-view captions for pedestrian images with a topic attention mechanism on global and local semantic regions.  ...  This feature vector is taken as input to a hierarchical recurrent neural network to generate multi-view captions for pedestrian images.  ...  Acknowledgment This work was supported by the National Natural Science Foundation of China under Grant 61772527.  ... 
doi:10.1016/j.compind.2018.01.015 fatcat:thfb3bi5gzcqhjahxdkuri47wy

Semantically Invariant Text-to-Image Generation

Shagan Sah, Dheeraj Peri, Ameya Shringi, Chi Zhang, Miguez Dominguez, Andreas Savakis, Rav Ptucha
2018 2018 25th IEEE International Conference on Image Processing (ICIP)  
Firstly, a n-gram metric based cost function is introduced that generalizes the caption with respect to the image.  ...  Along with MMVR, we propose two improvements to the text conditioned image generation.  ...  The forward pass is initiated by passing a random latent vector h t into the image generator which generates an imagê x. The image captioner uses the generated image to create a caption.  ... 
doi:10.1109/icip.2018.8451656 dblp:conf/icip/SahPSZDSP18 fatcat:glthffnurramdf77thd677zcky

Text-to-Image-to-Text Translation using Cycle Consistent Adversarial Networks [article]

Satya Krishna Gorti, Jeremy Ma
2018 arXiv   pre-print
Text-to-Image translation has been an active area of research in the recent past.  ...  We address this issue by using a captioning network to caption on generated images and exploit the distance between ground truth captions and generated captions to improve the network further.  ...  LSTM in the captioning network given an image I = G(z, ψ(t)) generated by the image synthesis network's generator.  ... 
arXiv:1808.04538v1 fatcat:bxlenocucnhvnpkglwkzz5wg3a
« Previous Showing results 1 — 15 out of 42,555 results