A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit <a rel="external noopener" href="https://arxiv.org/pdf/2205.05357v1.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
Filters
A Comprehensive Survey of Automated Audio Captioning
[article]
<span title="2022-05-11">2022</span>
<i >
arXiv
</i>
<span class="release-stage" >pre-print</span>
Audio captioning requires recognizing the acoustic scene, primary audio events and sometimes the spatial and temporal relationship between events in an audio clip. ...
Automated audio captioning, a task that mimics human perception as well as innovatively links audio processing and natural language processing, has overseen much progress over the last few years. ...
The utility of semantic guidance has been explored in image and video captioning and achieved better performance [93] , [94] . ...
<span class="external-identifiers">
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2205.05357v1">arXiv:2205.05357v1</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/wytjo7rphzdwloskiz34pv5r7m">fatcat:wytjo7rphzdwloskiz34pv5r7m</a>
</span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220514034849/https://arxiv.org/pdf/2205.05357v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/34/80/348051b4e0021422b0de67fdb1a1eed20e5dfc67.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2205.05357v1" title="arxiv.org access">
<button class="ui compact blue labeled icon button serp-button">
<i class="file alternate outline icon"></i>
arxiv.org
</button>
</a>
A Review on Methods and Applications in Multimodal Deep Learning
[article]
<span title="2022-02-18">2022</span>
<i >
arXiv
</i>
<span class="release-stage" >pre-print</span>
This paper focuses on multiple types of modalities, i.e., image, video, text, audio, body gestures, facial expressions, and physiological signals. ...
Despite the extensive development made for unimodal learning, it still cannot cover all the aspects of human learning. ...
[40] proposed an image caption generation framework with the guidance of Part of Speech (PoS). ...
<span class="external-identifiers">
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2202.09195v1">arXiv:2202.09195v1</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/wwxrmrwmerfabbenleylwmmj7y">fatcat:wwxrmrwmerfabbenleylwmmj7y</a>
</span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220508045925/https://arxiv.org/pdf/2202.09195v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/65/a0/65a01b760850d82505c2a04faf84a3e8c50398fe.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2202.09195v1" title="arxiv.org access">
<button class="ui compact blue labeled icon button serp-button">
<i class="file alternate outline icon"></i>
arxiv.org
</button>
</a>
Watch What You Just Said
<span title="">2017</span>
<i title="ACM Press">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/lahlxihmo5fhzpexw7rundu24u" style="color: black;">Proceedings of the on Thematic Workshops of ACM Multimedia 2017 - Thematic Workshops '17</a>
</i>
To obtain text-related image features for our attention model, we adopt the guiding Long Short-Term Memory (gLSTM) captioning architecture with CNN fine-tuning. ...
attention in image captioning. ...
This article solely reflects the opinions and conclusions of its authors and not DARPA, NSF, ARO nor Google. We sincerely thank Vikas Dhiman and Suren Kumar for their helpful discussions. ...
<span class="external-identifiers">
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/3126686.3126717">doi:10.1145/3126686.3126717</a>
<a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/mm/ZhouXKC17.html">dblp:conf/mm/ZhouXKC17</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/eahiuwbhlrfothin5b4wmpwwxa">fatcat:eahiuwbhlrfothin5b4wmpwwxa</a>
</span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190218140444/https://static.aminer.org/pdf/20170130/pdfs/mm/ywffqdn4pztb1tj0e6sj7zaco3cluxua.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/9f/85/9f85540a8e81596a1df0cbed1a97017d0d0efe7b.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/3126686.3126717">
<button class="ui left aligned compact blue labeled icon button serp-button">
<i class="external alternate icon"></i>
acm.org
</button>
</a>
Watch What You Just Said: Image Captioning with Text-Conditional Attention
[article]
<span title="2016-11-24">2016</span>
<i >
arXiv
</i>
<span class="release-stage" >pre-print</span>
To obtain text-related image features for our attention model, we adopt the guiding Long Short-Term Memory (gLSTM) captioning architecture with CNN fine-tuning. ...
attention in image captioning. ...
We show five randomly sampled words w.r.t. different parts of speech (noun, verb and adjective). Table 4 shows their top few nearest words. ...
<span class="external-identifiers">
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1606.04621v3">arXiv:1606.04621v3</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/hvjgo3hsijbn7hya7cjokqvhvi">fatcat:hvjgo3hsijbn7hya7cjokqvhvi</a>
</span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200901080341/https://arxiv.org/pdf/1606.04621v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/12/9d/129d716338dd2eb7712b53a406b508e277aab469.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1606.04621v3" title="arxiv.org access">
<button class="ui compact blue labeled icon button serp-button">
<i class="file alternate outline icon"></i>
arxiv.org
</button>
</a>
On Controlled DeEntanglement for Natural Language Processing
[article]
<span title="2019-09-22">2019</span>
<i >
arXiv
</i>
<span class="release-stage" >pre-print</span>
I conclude this writeup by a roadmap of experiments that show the applicability of this framework to scalability, flexibility and interpretibility. ...
Latest addition to the toolbox of human species is Artificial Intelligence(AI). ...
I have thus far worked on image captioning in the context of global control and emphatic text to speech in the context of fine grained control. ...
<span class="external-identifiers">
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1909.09964v1">arXiv:1909.09964v1</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/mi5wm7pnxrddplwluwyqlauuoe">fatcat:mi5wm7pnxrddplwluwyqlauuoe</a>
</span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200928161417/https://arxiv.org/pdf/1909.09964v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/aa/c0/aac0171415172515c05d14b4e931254437d3f8f6.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1909.09964v1" title="arxiv.org access">
<button class="ui compact blue labeled icon button serp-button">
<i class="file alternate outline icon"></i>
arxiv.org
</button>
</a>
Multimodal machine translation through visuals and speech
<span title="2020-08-13">2020</span>
<i title="Springer Science and Business Media LLC">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/y5uhxlnzh5g2xmblon2lrbuzjm" style="color: black;">Machine Translation</a>
</i>
These tasks are distinguished from their monolingual counterparts of speech recognition, image captioning, and video captioning by the requirement of models to generate outputs in a different language. ...
The paper concludes with a discussion of directions for future research in these areas: the need for more expansive and challenging datasets, for targeted evaluations of model performance, and for multimodality ...
We would also like to thank Maarit Koponen for her valuable feedback and her help in establishing our discussions of machine translation evaluation. ...
<span class="external-identifiers">
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s10590-020-09250-0">doi:10.1007/s10590-020-09250-0</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/jod3ghcsnnbipotcqp6sme4lna">fatcat:jod3ghcsnnbipotcqp6sme4lna</a>
</span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210715142936/https://acris.aalto.fi/ws/portalfiles/portal/56741604/Sulubacak2020_Article_MultimodalMachineTranslationTh.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/51/ca/51caf0f43d46a05354462c8c4a1b032a380e6276.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s10590-020-09250-0">
<button class="ui left aligned compact blue labeled icon button serp-button">
<i class="unlock alternate icon" style="background-color: #fb971f;"></i>
springer.com
</button>
</a>
Context and Attribute Grounded Dense Captioning
[article]
<span title="2019-04-02">2019</span>
<i >
arXiv
</i>
<span class="release-stage" >pre-print</span>
of the learned captions. ...
context in the input image. ...
As for contextual learning for image captioning, Yao et al. ...
<span class="external-identifiers">
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1904.01410v1">arXiv:1904.01410v1</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ebnw6mi3e5grbhjaachfakuqae">fatcat:ebnw6mi3e5grbhjaachfakuqae</a>
</span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200906080858/https://arxiv.org/pdf/1904.01410v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/62/84/6284c999e07c233bdce48ac351e7c99a5a69205a.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1904.01410v1" title="arxiv.org access">
<button class="ui compact blue labeled icon button serp-button">
<i class="file alternate outline icon"></i>
arxiv.org
</button>
</a>
Multimodal Machine Translation through Visuals and Speech
<span title="2019-11-28">2019</span>
<i title="Zenodo">
Zenodo
</i>
These tasks are distinguished from their monolingual counterparts of speech recognition, image captioning, and video captioning by the requirement of models to generate outputs in a different language. ...
The paper concludes with a discussion of directions for future research in these areas: the need for more expansive and challenging datasets, for targeted evaluations of model performance, and for multimodality ...
We would also like to thank Maarit Koponen for her valuable feedback and her help in establishing our discussions of machine translation evaluation. ...
<span class="external-identifiers">
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5281/zenodo.3690791">doi:10.5281/zenodo.3690791</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/otdy5i33fzfsnnbb3xgb6zph6q">fatcat:otdy5i33fzfsnnbb3xgb6zph6q</a>
</span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200301194542/https://zenodo.org/record/3690791/files/1911.12798.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/a6/f6/a6f62d2365aa63f5d9c90893ab8aaa25551276fe.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5281/zenodo.3690791">
<button class="ui left aligned compact blue labeled icon button serp-button">
<i class="unlock alternate icon" style="background-color: #fb971f;"></i>
zenodo.org
</button>
</a>
Context and Attribute Grounded Dense Captioning
<span title="">2019</span>
<i title="IEEE">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ilwxppn4d5hizekyd3ndvy2mii" style="color: black;">2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</a>
</i>
of the learned captions. ...
context in the input image. ...
As for contextual learning for image captioning, Yao et al. ...
<span class="external-identifiers">
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/cvpr.2019.00640">doi:10.1109/cvpr.2019.00640</a>
<a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/cvpr/YinSLYWS19.html">dblp:conf/cvpr/YinSLYWS19</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/n5raf2eb6vddrirujmy52t7dge">fatcat:n5raf2eb6vddrirujmy52t7dge</a>
</span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190819021210/http://openaccess.thecvf.com:80/content_CVPR_2019/papers/Yin_Context_and_Attribute_Grounded_Dense_Captioning_CVPR_2019_paper.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/eb/a6/eba62fe8050e475ffe533b9f70db538074d8d0d1.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/cvpr.2019.00640">
<button class="ui left aligned compact blue labeled icon button serp-button">
<i class="external alternate icon"></i>
ieee.com
</button>
</a>
Describing Natural Images Containing Novel Objects with Knowledge Guided Assitance
[article]
<span title="2017-10-17">2017</span>
<i >
arXiv
</i>
<span class="release-stage" >pre-print</span>
Evaluations show that our models outperform most of the prior work for out-of-domain captioning on MSCOCO and are useful for integration of knowledge and vision in general. ...
semantic attention and constrained inference in the caption generation model for describing images that depict unseen/novel objects. ...
[11] a multiword-label classifier is built using the caption aligned to an image by extracting part-of-speech (POS) tags such as nouns, verbs and adjectives attained for each word. ...
<span class="external-identifiers">
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1710.06303v1">arXiv:1710.06303v1</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/mu6zbevjbvd2jfl6sjd6yqmisy">fatcat:mu6zbevjbvd2jfl6sjd6yqmisy</a>
</span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200824113038/https://arxiv.org/pdf/1710.06303v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/c1/27/c127ac138a22c155a79f362562a52c070e2b4022.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1710.06303v1" title="arxiv.org access">
<button class="ui compact blue labeled icon button serp-button">
<i class="file alternate outline icon"></i>
arxiv.org
</button>
</a>
Localization and web accessibility
<span title="2010-07-31">2010</span>
<i title="Universitat Autonoma de Barcelona">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/bwr3vc2lrvhbpeueycwij5mlkm" style="color: black;">Tradumàtica tecnologies de la traducció</a>
</i>
Web content accessibility for people with functional diversity is essential for building and integrating society. ...
The language has to be determined for both the content of the page and for parts of this content that use a different language (except for proper names). ...
The third layer of guidance consists of a set of success criteria. For each guideline, several success criteria are provided. ...
<span class="external-identifiers">
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5565/rev/tradumatica.106">doi:10.5565/rev/tradumatica.106</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/mdvwfn3dynh7dci4tjtpr3ljoa">fatcat:mdvwfn3dynh7dci4tjtpr3ljoa</a>
</span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190427010419/https://ddd.uab.cat/pub/tradumatica/15787559n8/15787559n8a10.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/3a/47/3a4749a87363b448ef94825a7a11a8f1f52f02cd.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5565/rev/tradumatica.106">
<button class="ui left aligned compact blue labeled icon button serp-button">
<i class="unlock alternate icon" style="background-color: #fb971f;"></i>
Publisher / doi.org
</button>
</a>
Generative Adversarial Network-Based Neural Audio Caption Model for Oral Evaluation
<span title="2020-03-03">2020</span>
<i title="MDPI AG">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ikdpfme5h5egvnwtvvtjrnntyy" style="color: black;">Electronics</a>
</i>
The proposed approach is also integrated into a commercial application to generate expert comments for children's oral evaluation. ...
Traditionally, the Scoring Rubric is widely used in oral evaluation for providing a ranking score by assessing word accuracy, phoneme accuracy, fluency, and accent position of a tester. ...
The sequence-to-sequence learning model for generating image captions has become popular, but systems for generating audio captions in the speech field are indeed rare. ...
<span class="external-identifiers">
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3390/electronics9030424">doi:10.3390/electronics9030424</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/bsd5ua4aufbgpg2ykc7vcej2o4">fatcat:bsd5ua4aufbgpg2ykc7vcej2o4</a>
</span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200307130742/https://res.mdpi.com/d_attachment/electronics/electronics-09-00424/article_deploy/electronics-09-00424.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/4d/55/4d5592439775e759d29e8d5c559d784029d5d936.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3390/electronics9030424">
<button class="ui left aligned compact blue labeled icon button serp-button">
<i class="unlock alternate icon" style="background-color: #fb971f;"></i>
mdpi.com
</button>
</a>
Video Captioning with Guidance of Multimodal Latent Topics
<span title="">2017</span>
<i title="ACM Press">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/lahlxihmo5fhzpexw7rundu24u" style="color: black;">Proceedings of the 2017 ACM on Multimedia Conference - MM '17</a>
</i>
As for the caption task, we propose a novel topic-aware decoder to generate more accurate and detailed video descriptions with the guidance from latent topics. ...
For the topic prediction task, we use the mined topics as the teacher to train a student topic prediction model, which learns to predict the latent topics from multimodal contents of videos. ...
Hence, we argue that for video captioning, the guidance from latent topics might be superior than detected semantic concepts for the following reasons: 1) videos contain more objects than images but many ...
<span class="external-identifiers">
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/3123266.3123420">doi:10.1145/3123266.3123420</a>
<a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/mm/ChenCJH17.html">dblp:conf/mm/ChenCJH17</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/st3ogxnthbczhnr7kygbgf7psu">fatcat:st3ogxnthbczhnr7kygbgf7psu</a>
</span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190218104428/https://static.aminer.org/pdf/20170130/pdfs/mm/ehkt2xwprwx8blyumdoo71avzfupqde4.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/2f/36/2f368035eab5f0451ea250884d87d66215774e80.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/3123266.3123420">
<button class="ui left aligned compact blue labeled icon button serp-button">
<i class="external alternate icon"></i>
acm.org
</button>
</a>
Exploring Video Captioning Techniques: A Comprehensive Survey on Deep Learning Methods
<span title="2021-02-27">2021</span>
<i title="Springer Science and Business Media LLC">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/yzo2wjv2bbh2zo3zo5p7scalee" style="color: black;">SN Computer Science</a>
</i>
Regarding dataset usage, so far, MSVD and MSR-VTT are very much dominant due to be part of outstanding results among various captioning models. ...
Despite rapid advancement, our survey reveals that video captioning research-work still has a lot to develop in accessing the full potential of deep learning for classifying and captioning a large number ...
Both parts have a greater challenge than image captioning. ...
<span class="external-identifiers">
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s42979-021-00487-x">doi:10.1007/s42979-021-00487-x</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/uk75jtc4yngcpfmb5hzk4eqx7u">fatcat:uk75jtc4yngcpfmb5hzk4eqx7u</a>
</span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210718062252/https://link.springer.com/content/pdf/10.1007/s42979-021-00487-x.pdf?error=cookies_not_supported&code=5d03278a-68bd-413c-a4f9-545614baf6ea" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/7d/5d/7d5d6ddc6e4bdb490e554bc9f121d39df07ffea8.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s42979-021-00487-x">
<button class="ui left aligned compact blue labeled icon button serp-button">
<i class="external alternate icon"></i>
springer.com
</button>
</a>
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network
[article]
<span title="2019-08-27">2019</span>
<i >
arXiv
</i>
<span class="release-stage" >pre-print</span>
In this paper, we propose to guide the video caption generation with Part-of-Speech (POS) information, based on a gated fusion of multiple representations of input videos. ...
Such POS information not only boosts the video captioning performance but also improves the diversity of the generated captions. Our code is at: https://github.com/vsislab/Controllable_XGating. ...
Acknowledgments The authors would like to thank the anonymous reviewers for the constructive comments to improve the paper. This work was supported in part by the National ...
<span class="external-identifiers">
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1908.10072v1">arXiv:1908.10072v1</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/syq26lejvbdm5fp53gbm6qg4sq">fatcat:syq26lejvbdm5fp53gbm6qg4sq</a>
</span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200828122148/https://arxiv.org/pdf/1908.10072v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/5e/47/5e4742e510a26cd55b19d3ba191b688e7fb8f8cf.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1908.10072v1" title="arxiv.org access">
<button class="ui compact blue labeled icon button serp-button">
<i class="file alternate outline icon"></i>
arxiv.org
</button>
</a>
« Previous
Showing results 1 — 15 out of 4,333 results