2,242 Hits in 4.1 sec

YH Technologies at ActivityNet Challenge 2018 [article]

Ting Yao, Xue Li
2018 arXiv   pre-print
, dense-captioning events in videos, trimmed action recognition, and spatio-temporal action localization.  ...  This notebook paper presents an overview and comparative analysis of our systems designed for the following five tasks in ActivityNet Challenge 2018: temporal action proposals, temporal action localization  ...  Recurrent Tubelet Recognition (RTR) networks. The Recurrent Tubelet Recognition networks capitalizes on a multi-channel architecture for tubelet proposal recognition.  ... 
arXiv:1807.00686v1 fatcat:teqzrapbovacvltvvgbffx5zbu

A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism [article]

Rashid Khan, M Shujah Islam, Khadija Kanwal, Mansoor Iqbal, Md. Imran Hossain, Zhongfu Ye
2022 arXiv   pre-print
using a recurrent neural network (RNN).  ...  Image captioning is a fast-growing research field of computer vision and natural language processing that involves creating text explanations for images.  ...  Acknowledgment This research is supported by the Fundamental Research Funds for the Central Universities. (Grant no. WK2350000002).  ... 
arXiv:2203.01594v1 fatcat:bbtzqoczhngutjewmssxfsuufe

Image Description Generator using Deep Learning

Deepak R Ksheerasagar
2022 International Journal for Research in Applied Science and Engineering Technology  
The CNN-LSTM architecture combines a Convolutional Neural Network (CNN), which creates features that describe the images, with a Long Short-Term Memory (LSTM), a type of Recurrent Neural Network (RNN),  ...  image information for social media  ...  We show how to use recurrent neural networks (RNNs) effectively to produce captions for the system's input images in natural language (English). III.  ... 
doi:10.22214/ijraset.2022.45988 fatcat:vyioynpqqzav3lqqzncnax5lya

An Empirical Study of Language CNN for Image Captioning [article]

Jiuxiang Gu, Gang Wang, Jianfei Cai, Tsuhan Chen
2017 arXiv   pre-print
Language Models based on recurrent neural networks have dominated recent image caption generation tasks.  ...  In this paper, we introduce a Language CNN model which is suitable for statistical language modeling tasks and shows competitive performance in image captioning.  ...  We gratefully acknowledge the support of NVAITC (NVIDIA AI Tech Centre) for our research at NTU ROSE Lab, Singapore.  ... 
arXiv:1612.07086v3 fatcat:u3keuw3dizaq5blfwsxgphjthi

Fully Convolutional CaptionNet: Siamese Difference Captioning Attention Model

Ariyo Oluwasanmi, Enoch Frimpong, Muhammad Umar Aftab, Edward Y. Baagyere, Zhiguang Qin, Kifayat Ullah
2019 IEEE Access  
INDEX TERMS Image captioning, deep learning, Siamese network, recurrent neural network, convolutional neural network, attention, fully convolutional networks.  ...  The generation of the textual description of the differences in images is a relatively new concept that requires the fusion of both computer vision and natural language techniques.  ...  Alongside the encoder, recurrent neural network (RNN) is generally adopted as the decoder for generating caption automatically [16] .  ... 
doi:10.1109/access.2019.2957513 fatcat:4t7nsc62tze5hjq56rmvhruzsm

Features to Text: A Comprehensive Survey of Deep Learning on Semantic Segmentation and Image Captioning

Ariyo Oluwasammi, Muhammad Umar Aftab, Zhiguang Qin, Son Tung Ngo, Thang Van Doan, Son Ba Nguyen, Son Hoang Nguyen, Giang Hoang Nguyen, Dan Selisteanu
2021 Complexity  
In this survey, we deliberate on the use of deep learning techniques on the segmentation analysis of both 2D and 3D images using a fully convolutional network and other high-level hierarchical feature  ...  of semantic image segmentation and image captioning approaches.  ...  A fully convolutional localization network (FCLN) was developed to determine important regions of interest in an image. e model combines a recurrent neural network language model and a convolutional neural  ... 
doi:10.1155/2021/5538927 fatcat:4yae4kjqdne6vaqus5plna4mwm

Attention-Based Deep Learning Model for Image Captioning: A Comparative Study

Phyu Phyu Khaing, University of Computer Studies, Mandalay, Myanmar, May The` Yu
2019 International Journal of Image Graphics and Signal Processing  
Image captioning by applying deep learning model can enhance the description accuracy. Attention mechanisms are the upward trend in the model of deep learning for image caption generation.  ...  This also discusses the datasets for image captioning and the evaluation metrics to test the accuracy.  ...  This used local two-dimensional convolutional neural network for dynamic weighted sum, LSTM encoder for a visual feature and word-embedding feature extraction, and multimodal embedding for mapping the  ... 
doi:10.5815/ijigsp.2019.06.01 fatcat:j5xryx3frva3jm7htrpbmpjvsu

Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning [article]

Andrew Shin, Masataka Yamaguchi, Katsunori Ohnishi, Tatsuya Harada
2016 arXiv   pre-print
The workflow of extracting features from images using convolutional neural networks (CNN) and generating captions with recurrent neural networks (RNN) has become a de-facto standard for image captioning  ...  We propose to incorporate coding with vector of locally aggregated descriptors (VLAD) on spatial pyramid for CNN features of sub-regions in order to generate image representations that better reflect the  ...  Related Work A majority of recent work on image captioning task have been dominated by the usage of convolutional and recurrent neural networks for feature extraction and caption generation respectively  ... 
arXiv:1603.09046v1 fatcat:sasstlz7jfcpdgfheasgelos2y

Referring Expression Object Segmentation with Caption-Aware Consistency [article]

Yi-Wen Chen, Yi-Hsuan Tsai, Tiantian Wang, Yen-Yu Lin, Ming-Hsuan Yang
2019 arXiv   pre-print
To better communicate between the language and visual modules, we employ a caption generation network that takes features shared across both domains as input, and improves both representations via a consistency  ...  In this work, we focus on segmenting the object in an image specified by a referring expression.  ...  and use a fully convolutional network for foreground/background segmentation by using both the language and visual features.  ... 
arXiv:1910.04748v1 fatcat:wqscmarouven7hqymg5w2qscqu

Structure Preserving Convolutional Attention for Image Captioning

Shichen Lu, Ruimin Hu, Jing Liu, Longteng Guo, Fei Zheng
2019 Applied Sciences  
In this paper, we propose a convolutional attention module that can preserve the spatial structure of the image by performing the convolution operation directly on the 2D feature maps.  ...  In the task of image captioning, learning the attentive image regions is necessary to adaptively and precisely focus on the object semantics relevant to each decoded word.  ...  Usually, Convolutional Neural Network (CNN) is used to encode visual features and a recurrent neural network (RNN) is used to generate a caption [7, 8] .  ... 
doi:10.3390/app9142888 fatcat:ljxjwxgra5hqzogydnfez4jjc4

Deep Learning-Based Image Retrieval System with Clustering on Attention-Based Representations

Sumanth S. Rao, Shahid Ikram, Parashara Ramesh
2021 SN Computer Science  
better textual representation, use of various baseline convolution-based stacks for better image representation.  ...  We have conducted various experiments to improve the representation of the image and the caption obtained in the latent space for better correlation, for, e.g., use of bidirectional sequence models for  ...  We have taken a stab at trying different things with a few variations of recurrent systems, word embedding models, convolutional neural network stacks for transfer learning and, furthermore, fused the  ... 
doi:10.1007/s42979-021-00563-2 fatcat:uqfy5ztvu5bchaayd2ezlgfsgq

Image Description using Attention Mechanism

2019 International journal of recent technology and engineering  
There are different approaches for automated image captioning which explain the image contents along with a complete understanding of the image, rather than just simply classifying it into a particular  ...  Image Description involves generating a textual description of images which is essential for the problem of image understanding.  ...  [2] have utilized a combination of a deep convolutional network with re-current networks for sequence modeling, a single network that produces descriptions of images.  ... 
doi:10.35940/ijrte.b1555.0982s1119 fatcat:qrkqpwq6tnghnfuwue67cdr6ey

2020 Index IEEE Transactions on Multimedia Vol. 22

2020 IEEE transactions on multimedia  
., TMM Dec. 2020 3115-3127 Jevremovic, A., see Kostic, Z., TMM July 2020 1904-1916 Ji, Q., see Wang, S., TMM April 2020 1084-1097 Jia, K., see 1345-1357 Jia, Y., see 2138-2148 Jian, M., Dong, J.,  ...  ., +, TMM March 2020 730-743 Spatiotemporal Recurrent Convolutional Networks for Recognizing Spontaneous Micro-Expressions.  ...  ., +, TMM Nov. 2020 2990-3001 Spatiotemporal Recurrent Convolutional Networks for Recognizing Spontaneous Micro-Expressions.  ... 
doi:10.1109/tmm.2020.3047236 fatcat:llha6qbaandfvkhrzpe5gek6mq

Automated image captioning with deep neural networks

Abdullah Ahmad Zarir, Saad Bashar, Amelia Ritahani Ismail
2020 Science in Information Technology Letters  
This combination of neural networks is more suited for generating a caption from an image.  ...  The language translation model uses RNN for both encoding and decoding, whereas this model uses a Convolutional Neural Networks (CNN) for encoding and an RNN for decoding.  ... 
doi:10.31763/sitech.v1i1.31 fatcat:syqrubnksja37bdvzmh4k4xja4

Video Captioning with Transferred Semantic Attributes

Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei
2017 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
Most recent progress in this problem has been achieved through employing 2-D and/or 3-D Convolutional Neural Networks (CNNs) to encode video content and Recurrent Neural Networks (RNNs) to decode a sentence  ...  each other for captioning.  ...  Gated-Recurrent-Unit Recurrent Networks (GRU-RCN) [1] : GRU-RCN leverages convolutional GRU-RNN to extract visual representation and generate sentence based on the LSTM text-generator with softattention  ... 
doi:10.1109/cvpr.2017.111 dblp:conf/cvpr/PanYLM17 fatcat:mlmll73movgs5hacijd4omiyve
« Previous Showing results 1 — 15 out of 2,242 results