A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
[article]
2019
arXiv
pre-print
In this paper, we introduce a novel framework for image captioning which can generate diverse descriptions by allowing both grounding and controllability. ...
Code and annotations are publicly available at: https://github.com/aimagelab/show-control-and-tell. ...
Conclusion We presented Show, Control and Tell, a framework for generating controllable and grounded captions through regions. ...
arXiv:1811.10652v3
fatcat:57qqqlljlfhddgkywosi6xaucy
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
2019
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
In this paper, we introduce a novel framework for image captioning which can generate diverse descriptions by allowing both grounding and controllability. ...
Code and annotations are publicly available at: https://github. com/aimagelab/show-control-and-tell. ...
Conclusion We presented Show, Control and Tell, a framework for generating controllable and grounded captions through regions. ...
doi:10.1109/cvpr.2019.00850
dblp:conf/cvpr/CorniaBC19
fatcat:nkfw7bfy3baydmrsaputlq6e7y
Move Forward and Tell: A Progressive Generator of Video Descriptions
[article]
2018
arXiv
pre-print
We present an efficient framework that can generate a coherent paragraph to describe a given video. Previous works on video captioning usually focus on video clips. ...
They typically treat an entire video as a whole and generate the caption conditioned on a single embedding. ...
In this work, we develop a progressive generation framework that couples two recurrent networks, one for event selection and the other for caption generation. ...
arXiv:1807.10018v1
fatcat:xhay72cq4fhzjntpsgoveomvfy
Show, Edit and Tell: A Framework for Editing Image Captions
[article]
2020
arXiv
pre-print
Most image captioning frameworks generate captions directly from images, learning a mapping from visual features to natural language. ...
However, editing existing captions can be easier than generating new ones from scratch. ...
Figure 5 and 6 show some results generated by our editing framework. ...
arXiv:2003.03107v1
fatcat:eeniqo5agngvnl6d5u5zz6jszi
From Show to Tell: A Survey on Deep Learning-based Image Captioning
[article]
2021
arXiv
pre-print
Starting from 2015 the task has generally been addressed with pipelines composed of a visual encoder and a language model for text generation. ...
This work aims at providing a comprehensive overview of image captioning approaches, from visual encoding and text generation to training strategies, datasets, and evaluation metrics. ...
We also want to thank the authors who provided us with the captions and model weights for some of the surveyed approaches. ...
arXiv:2107.06912v3
fatcat:ezhutcovnvh4reiweedfmxjlve
Move Forward and Tell: A Progressive Generator of Video Descriptions
[chapter]
2018
Lecture Notes in Computer Science
We present an efficient framework that can generate a coherent paragraph to describe a given video. Previous works on video captioning usually focus on video clips. ...
They typically treat an entire video as a whole and generate the caption conditioned on a single embedding. ...
In this work, we develop a progressive generation framework that couples two recurrent networks, one for event selection and the other for caption generation. ...
doi:10.1007/978-3-030-01252-6_29
fatcat:twbxfif36rgtrcxa2m3wwbo2uu
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions
[article]
2018
arXiv
pre-print
To that end, we first extract attributes and generate descriptions as explanations for an image using pre-trained attribute detectors and image captioning models, respectively. ...
The advantages of such a breakdown include: (1) the attributes and captions can reflect what the system extracts from the image, thus can provide some explanations for the predicted answer; (2) these intermediate ...
Figure 7 : A control case for comparing the accuracy when inputting captions of different quality. ...
arXiv:1801.09041v1
fatcat:kfvlwn4arzbcxedqlirtbttzem
Show Auto-adaptive and Tell: Learned From the SEM Image Challenge
2021
IEEE Access
Then, a triplet neural network with proposed loss function to train the show auto-adaptive and tell model on 60% of the dataset for SEM images analysis, test on 30% and validate on 10%. ...
Firstly, we collected SEM images and corresponding captioning from previous papers and built a database. ...
BASELINE We re-implement show and tell and show adaptive and tell as our baseline methods. Show and tell consists of the image features extractor and a sentences generator. ...
doi:10.1109/access.2021.3068162
fatcat:jqweq3sxy5dipcvwyxscpbwnka
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions
2018
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
To that end, we first extract attributes and generate descriptions as explanations for an image. Next, a reasoning module utilizes these explanations in place of the image to infer an answer. ...
The advantages of such a breakdown include: (1) the attributes and captions can reflect what the system extracts from the image, thus can provide some insights for the predicted answer; (2) these intermediate ...
Acknowledgements Jiebo Luo would like to thank the support of Adobe and NSF Award #1704309. ...
doi:10.18653/v1/d18-1164
dblp:conf/emnlp/LiFYML18
fatcat:6crgtokkezahdcftnsw5jtx3vi
Drawing to Tell Versus Drawing to Intrigue?
2021
Visible Language
The author discusses Deacon's drawings and infers the potential of drawing as a methodology for anthropology. ...
In contrast to his expectations, Deacon found a culture in the process of decay. ...
of language are a crucial point, which can be also used to find a common ground between anthropology and visual communication. ...
doi:10.34314/vl.v55i3.4676
fatcat:t2f6j65fzffovji2k7mkk7prka
Show and Tell More: Topic-Oriented Multi-Sentence Image Captioning
2018
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Image captioning aims to generate textual descriptions for images. Most previous work generates a single-sentence description for each image. However, a picture is worth a thousand words. ...
In our model, each topic is integrated to a caption generator with a Fusion Gate Unit (FGU) to guide the generation of a sentence towards a certain topic perspective. ...
[Dai et al., 2017] introduced a random vector for controlling the diversity of a sentence with generative adversarial nets. ...
doi:10.24963/ijcai.2018/592
dblp:conf/ijcai/MaoZWL18
fatcat:qqhpantaiba2xiz67crqoa5oza
Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge
2017
IEEE Transactions on Pattern Analysis and Machine Intelligence
In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences ...
We describe and analyze the various improvements we applied to our own baseline and show the resulting performance in the competition, which we won ex-aequo with a team from Microsoft Research, and provide ...
Also many thanks to Chris Shallue for driving the efforts to reimplement and open source our model in TensorFlow. ...
doi:10.1109/tpami.2016.2587640
pmid:28055847
fatcat:32dqcgfe3zf3jjdwbpvlsrygda
Tell Me Where to Look: Guided Attention Inference Network
2018
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
In one common framework we address three shortcomings of previous approaches in modeling such attention maps: We (1) make attention maps an explicit and natural component of the end-to-end training for ...
Under mild assumptions our method can also be understood as a plug-in to existing weakly supervised learners to improve their generalization performance. ...
This research is supported in part by the NSF IIS award 1651902, ONR Young Investigator Award N00014-14-1-0484 and U.S. Army Research Office Award W911NF-17-1-0367. ...
doi:10.1109/cvpr.2018.00960
dblp:conf/cvpr/LiWPE018
fatcat:7oqwakk4rvasbjwsyhtbdqgsja
Tell Me Where to Look: Guided Attention Inference Network
[article]
2018
arXiv
pre-print
These attention maps are then available as priors for tasks such as object localization and semantic segmentation. ...
Under mild assumptions our method can also be understood as a plug-in to existing weakly supervised learners to improve their generalization performance. ...
This research is supported in part by the NSF IIS award 1651902, ONR Young Investigator Award N00014-14-1-0484 and U.S. Army Research Office Award W911NF-17-1-0367. ...
arXiv:1802.10171v1
fatcat:xca2ajzskrhald46jevzb5yyg4
What QCD tells us about nature — and why we should listen
2000
Nuclear Physics A
Then I visit a few of its current frontiers. Finally I draw some appropriate conclusions. ...
Nor does it mesh seamlessly with the considerably better tested and established framework we use for understanding the remainder of physics. ...
If the coupling is weak and the density large, our first approximation to the ground state is large fermi balls for all the quarks. ...
doi:10.1016/s0375-9474(99)00567-9
fatcat:oytpa6e2krffxhp6anpzadtkmi
« Previous
Showing results 1 — 15 out of 10,224 results