Filters








4,333 Hits in 6.5 sec

A Comprehensive Survey of Automated Audio Captioning [article]

Xuenan Xu, Mengyue Wu, Kai Yu
<span title="2022-05-11">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Audio captioning requires recognizing the acoustic scene, primary audio events and sometimes the spatial and temporal relationship between events in an audio clip.  ...  Automated audio captioning, a task that mimics human perception as well as innovatively links audio processing and natural language processing, has overseen much progress over the last few years.  ...  The utility of semantic guidance has been explored in image and video captioning and achieved better performance [93] , [94] .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2205.05357v1">arXiv:2205.05357v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/wytjo7rphzdwloskiz34pv5r7m">fatcat:wytjo7rphzdwloskiz34pv5r7m</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220514034849/https://arxiv.org/pdf/2205.05357v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/34/80/348051b4e0021422b0de67fdb1a1eed20e5dfc67.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2205.05357v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

A Review on Methods and Applications in Multimodal Deep Learning [article]

Jabeen Summaira, Xi Li, Amin Muhammad Shoib, Jabbar Abdul
<span title="2022-02-18">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
This paper focuses on multiple types of modalities, i.e., image, video, text, audio, body gestures, facial expressions, and physiological signals.  ...  Despite the extensive development made for unimodal learning, it still cannot cover all the aspects of human learning.  ...  [40] proposed an image caption generation framework with the guidance of Part of Speech (PoS).  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2202.09195v1">arXiv:2202.09195v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/wwxrmrwmerfabbenleylwmmj7y">fatcat:wwxrmrwmerfabbenleylwmmj7y</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220508045925/https://arxiv.org/pdf/2202.09195v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/65/a0/65a01b760850d82505c2a04faf84a3e8c50398fe.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2202.09195v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Watch What You Just Said

Luowei Zhou, Chenliang Xu, Parker Koch, Jason J. Corso
<span title="">2017</span> <i title="ACM Press"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/lahlxihmo5fhzpexw7rundu24u" style="color: black;">Proceedings of the on Thematic Workshops of ACM Multimedia 2017 - Thematic Workshops &#39;17</a> </i> &nbsp;
To obtain text-related image features for our attention model, we adopt the guiding Long Short-Term Memory (gLSTM) captioning architecture with CNN fine-tuning.  ...  attention in image captioning.  ...  This article solely reflects the opinions and conclusions of its authors and not DARPA, NSF, ARO nor Google. We sincerely thank Vikas Dhiman and Suren Kumar for their helpful discussions.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/3126686.3126717">doi:10.1145/3126686.3126717</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/mm/ZhouXKC17.html">dblp:conf/mm/ZhouXKC17</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/eahiuwbhlrfothin5b4wmpwwxa">fatcat:eahiuwbhlrfothin5b4wmpwwxa</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190218140444/https://static.aminer.org/pdf/20170130/pdfs/mm/ywffqdn4pztb1tj0e6sj7zaco3cluxua.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/9f/85/9f85540a8e81596a1df0cbed1a97017d0d0efe7b.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/3126686.3126717"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

Watch What You Just Said: Image Captioning with Text-Conditional Attention [article]

Luowei Zhou, Chenliang Xu, Parker Koch, Jason J. Corso
<span title="2016-11-24">2016</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
To obtain text-related image features for our attention model, we adopt the guiding Long Short-Term Memory (gLSTM) captioning architecture with CNN fine-tuning.  ...  attention in image captioning.  ...  We show five randomly sampled words w.r.t. different parts of speech (noun, verb and adjective). Table 4 shows their top few nearest words.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1606.04621v3">arXiv:1606.04621v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/hvjgo3hsijbn7hya7cjokqvhvi">fatcat:hvjgo3hsijbn7hya7cjokqvhvi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200901080341/https://arxiv.org/pdf/1606.04621v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/12/9d/129d716338dd2eb7712b53a406b508e277aab469.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1606.04621v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

On Controlled DeEntanglement for Natural Language Processing [article]

SaiKrishna Rallabandi
<span title="2019-09-22">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
I conclude this writeup by a roadmap of experiments that show the applicability of this framework to scalability, flexibility and interpretibility.  ...  Latest addition to the toolbox of human species is Artificial Intelligence(AI).  ...  I have thus far worked on image captioning in the context of global control and emphatic text to speech in the context of fine grained control.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1909.09964v1">arXiv:1909.09964v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/mi5wm7pnxrddplwluwyqlauuoe">fatcat:mi5wm7pnxrddplwluwyqlauuoe</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200928161417/https://arxiv.org/pdf/1909.09964v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/aa/c0/aac0171415172515c05d14b4e931254437d3f8f6.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1909.09964v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Multimodal machine translation through visuals and speech

Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, Jörg Tiedemann
<span title="2020-08-13">2020</span> <i title="Springer Science and Business Media LLC"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/y5uhxlnzh5g2xmblon2lrbuzjm" style="color: black;">Machine Translation</a> </i> &nbsp;
These tasks are distinguished from their monolingual counterparts of speech recognition, image captioning, and video captioning by the requirement of models to generate outputs in a different language.  ...  The paper concludes with a discussion of directions for future research in these areas: the need for more expansive and challenging datasets, for targeted evaluations of model performance, and for multimodality  ...  We would also like to thank Maarit Koponen for her valuable feedback and her help in establishing our discussions of machine translation evaluation.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s10590-020-09250-0">doi:10.1007/s10590-020-09250-0</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/jod3ghcsnnbipotcqp6sme4lna">fatcat:jod3ghcsnnbipotcqp6sme4lna</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210715142936/https://acris.aalto.fi/ws/portalfiles/portal/56741604/Sulubacak2020_Article_MultimodalMachineTranslationTh.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/51/ca/51caf0f43d46a05354462c8c4a1b032a380e6276.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s10590-020-09250-0"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> springer.com </button> </a>

Context and Attribute Grounded Dense Captioning [article]

Guojun Yin and Lu Sheng and Bin Liu and Nenghai Yu and Xiaogang Wang and Jing Shao
<span title="2019-04-02">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
of the learned captions.  ...  context in the input image.  ...  As for contextual learning for image captioning, Yao et al.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1904.01410v1">arXiv:1904.01410v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ebnw6mi3e5grbhjaachfakuqae">fatcat:ebnw6mi3e5grbhjaachfakuqae</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200906080858/https://arxiv.org/pdf/1904.01410v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/62/84/6284c999e07c233bdce48ac351e7c99a5a69205a.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1904.01410v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Multimodal Machine Translation through Visuals and Speech

Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, Jörg Tiedemann
<span title="2019-11-28">2019</span> <i title="Zenodo"> Zenodo </i> &nbsp;
These tasks are distinguished from their monolingual counterparts of speech recognition, image captioning, and video captioning by the requirement of models to generate outputs in a different language.  ...  The paper concludes with a discussion of directions for future research in these areas: the need for more expansive and challenging datasets, for targeted evaluations of model performance, and for multimodality  ...  We would also like to thank Maarit Koponen for her valuable feedback and her help in establishing our discussions of machine translation evaluation.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5281/zenodo.3690791">doi:10.5281/zenodo.3690791</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/otdy5i33fzfsnnbb3xgb6zph6q">fatcat:otdy5i33fzfsnnbb3xgb6zph6q</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200301194542/https://zenodo.org/record/3690791/files/1911.12798.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/a6/f6/a6f62d2365aa63f5d9c90893ab8aaa25551276fe.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5281/zenodo.3690791"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> zenodo.org </button> </a>

Context and Attribute Grounded Dense Captioning

Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao
<span title="">2019</span> <i title="IEEE"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ilwxppn4d5hizekyd3ndvy2mii" style="color: black;">2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</a> </i> &nbsp;
of the learned captions.  ...  context in the input image.  ...  As for contextual learning for image captioning, Yao et al.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/cvpr.2019.00640">doi:10.1109/cvpr.2019.00640</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/cvpr/YinSLYWS19.html">dblp:conf/cvpr/YinSLYWS19</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/n5raf2eb6vddrirujmy52t7dge">fatcat:n5raf2eb6vddrirujmy52t7dge</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190819021210/http://openaccess.thecvf.com:80/content_CVPR_2019/papers/Yin_Context_and_Attribute_Grounded_Dense_Captioning_CVPR_2019_paper.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/eb/a6/eba62fe8050e475ffe533b9f70db538074d8d0d1.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/cvpr.2019.00640"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>

Describing Natural Images Containing Novel Objects with Knowledge Guided Assitance [article]

Aditya Mogadala, Umanga Bista, Lexing Xie, Achim Rettinger
<span title="2017-10-17">2017</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Evaluations show that our models outperform most of the prior work for out-of-domain captioning on MSCOCO and are useful for integration of knowledge and vision in general.  ...  semantic attention and constrained inference in the caption generation model for describing images that depict unseen/novel objects.  ...  [11] a multiword-label classifier is built using the caption aligned to an image by extracting part-of-speech (POS) tags such as nouns, verbs and adjectives attained for each word.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1710.06303v1">arXiv:1710.06303v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/mu6zbevjbvd2jfl6sjd6yqmisy">fatcat:mu6zbevjbvd2jfl6sjd6yqmisy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200824113038/https://arxiv.org/pdf/1710.06303v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/c1/27/c127ac138a22c155a79f362562a52c070e2b4022.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1710.06303v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Localization and web accessibility

Emmanuelle Gutiérrez y Restrepo, Loïc Martínez Normand
<span title="2010-07-31">2010</span> <i title="Universitat Autonoma de Barcelona"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/bwr3vc2lrvhbpeueycwij5mlkm" style="color: black;">Tradumàtica tecnologies de la traducció</a> </i> &nbsp;
Web content accessibility for people with functional diversity is essential for building and integrating society.  ...  The language has to be determined for both the content of the page and for parts of this content that use a different language (except for proper names).  ...  The third layer of guidance consists of a set of success criteria. For each guideline, several success criteria are provided.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5565/rev/tradumatica.106">doi:10.5565/rev/tradumatica.106</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/mdvwfn3dynh7dci4tjtpr3ljoa">fatcat:mdvwfn3dynh7dci4tjtpr3ljoa</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190427010419/https://ddd.uab.cat/pub/tradumatica/15787559n8/15787559n8a10.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/3a/47/3a4749a87363b448ef94825a7a11a8f1f52f02cd.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5565/rev/tradumatica.106"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>

Generative Adversarial Network-Based Neural Audio Caption Model for Oral Evaluation

Liu Zhang, Chao Shu, Jin Guo, Hanyi Zhang, Cheng Xie, Qing Liu
<span title="2020-03-03">2020</span> <i title="MDPI AG"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ikdpfme5h5egvnwtvvtjrnntyy" style="color: black;">Electronics</a> </i> &nbsp;
The proposed approach is also integrated into a commercial application to generate expert comments for children's oral evaluation.  ...  Traditionally, the Scoring Rubric is widely used in oral evaluation for providing a ranking score by assessing word accuracy, phoneme accuracy, fluency, and accent position of a tester.  ...  The sequence-to-sequence learning model for generating image captions has become popular, but systems for generating audio captions in the speech field are indeed rare.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3390/electronics9030424">doi:10.3390/electronics9030424</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/bsd5ua4aufbgpg2ykc7vcej2o4">fatcat:bsd5ua4aufbgpg2ykc7vcej2o4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200307130742/https://res.mdpi.com/d_attachment/electronics/electronics-09-00424/article_deploy/electronics-09-00424.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/4d/55/4d5592439775e759d29e8d5c559d784029d5d936.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3390/electronics9030424"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> mdpi.com </button> </a>

Video Captioning with Guidance of Multimodal Latent Topics

Shizhe Chen, Jia Chen, Qin Jin, Alexander Hauptmann
<span title="">2017</span> <i title="ACM Press"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/lahlxihmo5fhzpexw7rundu24u" style="color: black;">Proceedings of the 2017 ACM on Multimedia Conference - MM &#39;17</a> </i> &nbsp;
As for the caption task, we propose a novel topic-aware decoder to generate more accurate and detailed video descriptions with the guidance from latent topics.  ...  For the topic prediction task, we use the mined topics as the teacher to train a student topic prediction model, which learns to predict the latent topics from multimodal contents of videos.  ...  Hence, we argue that for video captioning, the guidance from latent topics might be superior than detected semantic concepts for the following reasons: 1) videos contain more objects than images but many  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/3123266.3123420">doi:10.1145/3123266.3123420</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/mm/ChenCJH17.html">dblp:conf/mm/ChenCJH17</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/st3ogxnthbczhnr7kygbgf7psu">fatcat:st3ogxnthbczhnr7kygbgf7psu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190218104428/https://static.aminer.org/pdf/20170130/pdfs/mm/ehkt2xwprwx8blyumdoo71avzfupqde4.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/2f/36/2f368035eab5f0451ea250884d87d66215774e80.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/3123266.3123420"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

Exploring Video Captioning Techniques: A Comprehensive Survey on Deep Learning Methods

Saiful Islam, Aurpan Dash, Ashek Seum, Amir Hossain Raj, Tonmoy Hossain, Faisal Muhammad Shah
<span title="2021-02-27">2021</span> <i title="Springer Science and Business Media LLC"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/yzo2wjv2bbh2zo3zo5p7scalee" style="color: black;">SN Computer Science</a> </i> &nbsp;
Regarding dataset usage, so far, MSVD and MSR-VTT are very much dominant due to be part of outstanding results among various captioning models.  ...  Despite rapid advancement, our survey reveals that video captioning research-work still has a lot to develop in accessing the full potential of deep learning for classifying and captioning a large number  ...  Both parts have a greater challenge than image captioning.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s42979-021-00487-x">doi:10.1007/s42979-021-00487-x</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/uk75jtc4yngcpfmb5hzk4eqx7u">fatcat:uk75jtc4yngcpfmb5hzk4eqx7u</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210718062252/https://link.springer.com/content/pdf/10.1007/s42979-021-00487-x.pdf?error=cookies_not_supported&amp;code=5d03278a-68bd-413c-a4f9-545614baf6ea" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/7d/5d/7d5d6ddc6e4bdb490e554bc9f121d39df07ffea8.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s42979-021-00487-x"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network [article]

Bairui Wang, Lin Ma, Wei Zhang, Wenhao Jiang, Jingwen Wang, Wei Liu
<span title="2019-08-27">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this paper, we propose to guide the video caption generation with Part-of-Speech (POS) information, based on a gated fusion of multiple representations of input videos.  ...  Such POS information not only boosts the video captioning performance but also improves the diversity of the generated captions. Our code is at: https://github.com/vsislab/Controllable_XGating.  ...  Acknowledgments The authors would like to thank the anonymous reviewers for the constructive comments to improve the paper. This work was supported in part by the National  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1908.10072v1">arXiv:1908.10072v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/syq26lejvbdm5fp53gbm6qg4sq">fatcat:syq26lejvbdm5fp53gbm6qg4sq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200828122148/https://arxiv.org/pdf/1908.10072v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/5e/47/5e4742e510a26cd55b19d3ba191b688e7fb8f8cf.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1908.10072v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 4,333 results