Filters








10,224 Hits in 7.0 sec

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions [article]

Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
2019 arXiv   pre-print
In this paper, we introduce a novel framework for image captioning which can generate diverse descriptions by allowing both grounding and controllability.  ...  Code and annotations are publicly available at: https://github.com/aimagelab/show-control-and-tell.  ...  Conclusion We presented Show, Control and Tell, a framework for generating controllable and grounded captions through regions.  ... 
arXiv:1811.10652v3 fatcat:57qqqlljlfhddgkywosi6xaucy

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
In this paper, we introduce a novel framework for image captioning which can generate diverse descriptions by allowing both grounding and controllability.  ...  Code and annotations are publicly available at: https://github. com/aimagelab/show-control-and-tell.  ...  Conclusion We presented Show, Control and Tell, a framework for generating controllable and grounded captions through regions.  ... 
doi:10.1109/cvpr.2019.00850 dblp:conf/cvpr/CorniaBC19 fatcat:nkfw7bfy3baydmrsaputlq6e7y

Move Forward and Tell: A Progressive Generator of Video Descriptions [article]

Yilei Xiong, Bo Dai, Dahua Lin
2018 arXiv   pre-print
We present an efficient framework that can generate a coherent paragraph to describe a given video. Previous works on video captioning usually focus on video clips.  ...  They typically treat an entire video as a whole and generate the caption conditioned on a single embedding.  ...  In this work, we develop a progressive generation framework that couples two recurrent networks, one for event selection and the other for caption generation.  ... 
arXiv:1807.10018v1 fatcat:xhay72cq4fhzjntpsgoveomvfy

Show, Edit and Tell: A Framework for Editing Image Captions [article]

Fawaz Sammani, Luke Melas-Kyriazi
2020 arXiv   pre-print
Most image captioning frameworks generate captions directly from images, learning a mapping from visual features to natural language.  ...  However, editing existing captions can be easier than generating new ones from scratch.  ...  Figure 5 and 6 show some results generated by our editing framework.  ... 
arXiv:2003.03107v1 fatcat:eeniqo5agngvnl6d5u5zz6jszi

From Show to Tell: A Survey on Deep Learning-based Image Captioning [article]

Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Silvia Cascianelli, Giuseppe Fiameni, Rita Cucchiara
2021 arXiv   pre-print
Starting from 2015 the task has generally been addressed with pipelines composed of a visual encoder and a language model for text generation.  ...  This work aims at providing a comprehensive overview of image captioning approaches, from visual encoding and text generation to training strategies, datasets, and evaluation metrics.  ...  We also want to thank the authors who provided us with the captions and model weights for some of the surveyed approaches.  ... 
arXiv:2107.06912v3 fatcat:ezhutcovnvh4reiweedfmxjlve

Move Forward and Tell: A Progressive Generator of Video Descriptions [chapter]

Yilei Xiong, Bo Dai, Dahua Lin
2018 Lecture Notes in Computer Science  
We present an efficient framework that can generate a coherent paragraph to describe a given video. Previous works on video captioning usually focus on video clips.  ...  They typically treat an entire video as a whole and generate the caption conditioned on a single embedding.  ...  In this work, we develop a progressive generation framework that couples two recurrent networks, one for event selection and the other for caption generation.  ... 
doi:10.1007/978-3-030-01252-6_29 fatcat:twbxfif36rgtrcxa2m3wwbo2uu

Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions [article]

Qing Li, Jianlong Fu, Dongfei Yu, Tao Mei, Jiebo Luo
2018 arXiv   pre-print
To that end, we first extract attributes and generate descriptions as explanations for an image using pre-trained attribute detectors and image captioning models, respectively.  ...  The advantages of such a breakdown include: (1) the attributes and captions can reflect what the system extracts from the image, thus can provide some explanations for the predicted answer; (2) these intermediate  ...  Figure 7 : A control case for comparing the accuracy when inputting captions of different quality.  ... 
arXiv:1801.09041v1 fatcat:kfvlwn4arzbcxedqlirtbttzem

Show Auto-adaptive and Tell: Learned From the SEM Image Challenge

Jing Su, Jing Li
2021 IEEE Access  
Then, a triplet neural network with proposed loss function to train the show auto-adaptive and tell model on 60% of the dataset for SEM images analysis, test on 30% and validate on 10%.  ...  Firstly, we collected SEM images and corresponding captioning from previous papers and built a database.  ...  BASELINE We re-implement show and tell and show adaptive and tell as our baseline methods. Show and tell consists of the image features extractor and a sentences generator.  ... 
doi:10.1109/access.2021.3068162 fatcat:jqweq3sxy5dipcvwyxscpbwnka

Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions

Qing Li, Jianlong Fu, Dongfei Yu, Tao Mei, Jiebo Luo
2018 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing  
To that end, we first extract attributes and generate descriptions as explanations for an image. Next, a reasoning module utilizes these explanations in place of the image to infer an answer.  ...  The advantages of such a breakdown include: (1) the attributes and captions can reflect what the system extracts from the image, thus can provide some insights for the predicted answer; (2) these intermediate  ...  Acknowledgements Jiebo Luo would like to thank the support of Adobe and NSF Award #1704309.  ... 
doi:10.18653/v1/d18-1164 dblp:conf/emnlp/LiFYML18 fatcat:6crgtokkezahdcftnsw5jtx3vi

Drawing to Tell Versus Drawing to Intrigue?

Michael Renner
2021 Visible Language  
The author discusses Deacon's drawings and infers the potential of drawing as a methodology for anthropology.  ...  In contrast to his expectations, Deacon found a culture in the process of decay.  ...  of language are a crucial point, which can be also used to find a common ground between anthropology and visual communication.  ... 
doi:10.34314/vl.v55i3.4676 fatcat:t2f6j65fzffovji2k7mkk7prka

Show and Tell More: Topic-Oriented Multi-Sentence Image Captioning

Yuzhao Mao, Chang Zhou, Xiaojie Wang, Ruifan Li
2018 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence  
Image captioning aims to generate textual descriptions for images. Most previous work generates a single-sentence description for each image. However, a picture is worth a thousand words.  ...  In our model, each topic is integrated to a caption generator with a Fusion Gate Unit (FGU) to guide the generation of a sentence towards a certain topic perspective.  ...  [Dai et al., 2017] introduced a random vector for controlling the diversity of a sentence with generative adversarial nets.  ... 
doi:10.24963/ijcai.2018/592 dblp:conf/ijcai/MaoZWL18 fatcat:qqhpantaiba2xiz67crqoa5oza

Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge

Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan
2017 IEEE Transactions on Pattern Analysis and Machine Intelligence  
In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences  ...  We describe and analyze the various improvements we applied to our own baseline and show the resulting performance in the competition, which we won ex-aequo with a team from Microsoft Research, and provide  ...  Also many thanks to Chris Shallue for driving the efforts to reimplement and open source our model in TensorFlow.  ... 
doi:10.1109/tpami.2016.2587640 pmid:28055847 fatcat:32dqcgfe3zf3jjdwbpvlsrygda

Tell Me Where to Look: Guided Attention Inference Network

Kunpeng Li, Ziyan Wu, Kuan-Chuan Peng, Jan Ernst, Yun Fu
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
In one common framework we address three shortcomings of previous approaches in modeling such attention maps: We (1) make attention maps an explicit and natural component of the end-to-end training for  ...  Under mild assumptions our method can also be understood as a plug-in to existing weakly supervised learners to improve their generalization performance.  ...  This research is supported in part by the NSF IIS award 1651902, ONR Young Investigator Award N00014-14-1-0484 and U.S. Army Research Office Award W911NF-17-1-0367.  ... 
doi:10.1109/cvpr.2018.00960 dblp:conf/cvpr/LiWPE018 fatcat:7oqwakk4rvasbjwsyhtbdqgsja

Tell Me Where to Look: Guided Attention Inference Network [article]

Kunpeng Li, Ziyan Wu, Kuan-Chuan Peng, Jan Ernst, Yun Fu
2018 arXiv   pre-print
These attention maps are then available as priors for tasks such as object localization and semantic segmentation.  ...  Under mild assumptions our method can also be understood as a plug-in to existing weakly supervised learners to improve their generalization performance.  ...  This research is supported in part by the NSF IIS award 1651902, ONR Young Investigator Award N00014-14-1-0484 and U.S. Army Research Office Award W911NF-17-1-0367.  ... 
arXiv:1802.10171v1 fatcat:xca2ajzskrhald46jevzb5yyg4

What QCD tells us about nature — and why we should listen

Frank Wilczek
2000 Nuclear Physics A  
Then I visit a few of its current frontiers. Finally I draw some appropriate conclusions.  ...  Nor does it mesh seamlessly with the considerably better tested and established framework we use for understanding the remainder of physics.  ...  If the coupling is weak and the density large, our first approximation to the ground state is large fermi balls for all the quarks.  ... 
doi:10.1016/s0375-9474(99)00567-9 fatcat:oytpa6e2krffxhp6anpzadtkmi
« Previous Showing results 1 — 15 out of 10,224 results