Filters








37,861 Hits in 6.0 sec

Simple Image Description Generator via a Linear Phrase-Based Approach [article]

Remi Lebret and Pedro O. Pinheiro and Ronan Collobert
2015 arXiv   pre-print
Based on caption syntax statistics, we propose a simple language model that can produce relevant descriptions for a given test image using the phrases inferred.  ...  In this paper, we present a simple model that is able to generate descriptive sentences given a sample image. This model has a strong focus on the syntax of the descriptions.  ...  ACKNOWLEDGEMENTS This work was supported by the HASLER foundation through the grant "Information and Communication Technology for a Better World 2020" (SmartWorld).  ... 
arXiv:1412.8419v3 fatcat:egazplijuzexlpi2sstxcj2j6m

Collective Generation of Natural Image Descriptions

Polina Kuznetsova, Vicente Ordonez, Alexander C. Berg, Tamara L. Berg, Yejin Choi
2012 Annual Meeting of the Association for Computational Linguistics  
More specifically, given a query image, we retrieve existing human-composed phrases used to describe visually similar images, then selectively combine those phrases to generate a novel description for  ...  We present a holistic data-driven approach to image description generation, exploiting the vast amount of (noisy) parallel image data and associated natural language descriptions available on the web.  ...  To conclude, we have presented a collective approach to generating natural image descriptions.  ... 
dblp:conf/acl/KuznetsovaOBBC12 fatcat:5xanpftna5eo3eetyqxfc6sycu

Composing Simple Image Descriptions using Web-scale N-grams

Siming Li, Girish Kulkarni, Tamara L. Berg, Alexander C. Berg, Yejin Choi
2011 Conference on Computational Natural Language Learning  
We present a simple yet effective approach to automatically compose image descriptions given computer vision based inputs and using web-scale n-grams.  ...  Experimental results indicate that it is viable to generate simple textual descriptions that are pertinent to the specific content of an image, while permitting creativity in the description -making for  ...  [Step II] -Phrase Fusion Given the expanded sets of phrases O 1 , O 2 , and R described above, we perform phrase fusion to generate simple image description.  ... 
dblp:conf/conll/LiKBBC11 fatcat:6vhprm4sxbfkpba2bshght6ina

Common Subspace for Model and Similarity: Phrase Learning for Caption Generation from Images

Yoshitaka Ushiku, Masataka Yamaguchi, Yusuke Mukuta, Tatsuya Harada
2015 2015 IEEE International Conference on Computer Vision (ICCV)  
A caption for an input image can be generated by connecting estimated phrases using a grammar model.  ...  Recent works focus on descriptive phrases, such as "a white dog" to explain the visual composites of an input image.  ...  Discussion of the phrase approach This subsection justifies the use of a phrase-based approach comparing it to a template-based approach.  ... 
doi:10.1109/iccv.2015.306 dblp:conf/iccv/UshikuYMH15 fatcat:vgy6fdt56rdqfhu5phniyr7obe

Making the Most of Text Semantics to Improve Biomedical Vision–Language Processing [article]

Benedikt Boecking, Naoto Usuyama, Shruthi Bannur, Daniel C. Castro, Anton Schwaighofer, Stephanie Hyland, Maria Wetscherek, Tristan Naumann, Aditya Nori, Javier Alvarez-Valle, Hoifung Poon, Ozan Oktay
2022 arXiv   pre-print
Further, we propose a self-supervised joint vision--language approach with a focus on better text modelling.  ...  We release a new dataset with locally-aligned phrase grounding annotations by radiologists to facilitate the study of complex semantic modelling in biomedical vision--language processing.  ...  Fine-tuning is done via linear probing, i.e. only a last linear layer is trained.  ... 
arXiv:2204.09817v4 fatcat:c72thidabfbgpgln7yjeoeu6ae

Reasoning about Fine-grained Attribute Phrases using Reference Games [article]

Jong-Chyi Su, Chenyun Wu, Huaizu Jiang, Subhransu Maji
2017 arXiv   pre-print
Moreover, due to the compositionality of attribute phrases, the trained listeners can interpret descriptions not seen during training for image retrieval, and the speakers can generate attribute-based  ...  Data collected in a pairwise manner improves the ability of the speaker to generate, and the ability of the listener to interpret visual descriptions.  ...  Acknowledgement: This research was supported in part by the NSF grants 1617917 and 1661259, and a faculty gift from Facebook.  ... 
arXiv:1708.08874v1 fatcat:dlxudfbeqnadzai3bjtxiz4pnm

Large Scale Retrieval and Generation of Image Descriptions

Vicente Ordonez, Xufeng Han, Polina Kuznetsova, Girish Kulkarni, Margaret Mitchell, Kota Yamaguchi, Karl Stratos, Amit Goyal, Jesse Dodge, Alyssa Mensch, Hal Daumé, Alexander C. Berg (+2 others)
2015 International Journal of Computer Vision  
To do this we develop data-driven approaches for image description generation, using retrieval-based techniques to gather either: (a) whole captions associated with a visually similar image, or (b) relevant  ...  bits of text (phrases) from a large collection of image+description pairs.  ...  The second uses the phrases as features for text based image search. Data-driven approaches to generation require a set of captioned photographs.  ... 
doi:10.1007/s11263-015-0840-y fatcat:o3vyf24rm5g2fnvrbwdpnhpfsi

Learning to Interpret and Describe Abstract Scenes

Luis Gilberto Mateos Ortiz, Clemens Wolff, Mirella Lapata
2015 Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies  
We demonstrate that this approach produces human-like scene descriptions which are both fluent and relevant, outperforming a number of competitive alternatives based on templates, sentence-based retrieval  ...  We propose a model inspired by machine translation operating over a large parallel corpus of visual relations and linguistic descriptions.  ...  Phrase-based approaches are more involved in that phrases need to be composed into a description and extraneous information optionally removed.  ... 
doi:10.3115/v1/n15-1174 dblp:conf/naacl/OrtizWL15 fatcat:gky657bg6bddthes4ci6jrrutq

Describing Textures using Natural Language [article]

Chenyun Wu, Mikayla Timm, Subhransu Maji
2020 arXiv   pre-print
In this paper, we study the problem of describing visual attributes of texture on a novel dataset containing rich descriptions of textures, and conduct a systematic study of current generative and discriminative  ...  We provide critical analysis of existing models by generating synthetic but realistic textures with different descriptions.  ...  To model the relation between texture images and their descriptions we investigate a discriminative approach, a metric-learning based approach, and a generative modeling based approach [55] on our dataset  ... 
arXiv:2008.01180v1 fatcat:mod3tmfajrhcplfm4vwzuefaqe

Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures [article]

Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank
2017 arXiv   pre-print
In this survey, we classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual  ...  Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities.  ...  Finally, a description is generated from these collected phrases for each detected object via integer linear programming (ILP) which considers factors such as word ordering, redundancy, etc.  ... 
arXiv:1601.03896v2 fatcat:lbifbktev5dtbhx4obldg4t5x4

Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures

Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank
2016 The Journal of Artificial Intelligence Research  
In this survey, we classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual  ...  Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities.  ...  Finally, a description is generated from these collected phrases for each detected object via integer linear programming (ILP) which considers factors such as word ordering, redundancy, etc.  ... 
doi:10.1613/jair.4900 fatcat:4ozmxbm735hlbf3umh6rpgmloe

Web-Based Semantic Fragment Discovery for On-Line Lingual-Visual Similarity

Xiaoshuai Sun, Jiewei Cao, Chao Li, Lei Zhu, Heng Tao Shen
2017 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
visual representation which automatically links generic lingual phrases to their related visual contents.  ...  In this paper, we present an automatic approach for on-line discovery of visual-lingual semantic fragments from weakly labeled Internet images.  ...  For example, image knowledge bases such as ImageNet (Deng et al. 2009 ) enable the possibility for the visualization of semantic entities defined as words or simple phrases.  ... 
doi:10.1609/aaai.v31i1.10490 fatcat:uwaj6jld6zhoxbvgeqjcjuvik4

Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing

Hamid Izadinia, Fereshteh Sadeghi, Santosh K. Divvala, Hannaneh Hajishirzi, Yejin Choi, Ali Farhadi
2015 2015 IEEE International Conference on Computer Vision (ICCV)  
We introduce Segment-Phrase Table (SPT), a large collection of bijective associations between textual phrases and their corresponding segmentations.  ...  Leveraging recent progress in object recognition and natural language semantics, we show how we can successfully build a highquality segment-phrase table using minimal human supervision.  ...  In summary, our key contributions are: (i) we motivate segment-phrase table as a convenient inter-modal correspondence representation and present a simple approach that involves minimal human supervision  ... 
doi:10.1109/iccv.2015.10 dblp:conf/iccv/IzadiniaSDHCF15 fatcat:5vnr2nu4fvarvgru2a2pzvtzju

Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing [article]

Hamid Izadinia, Fereshteh Sadeghi, Santosh Kumar Divvala, Yejin Choi, Ali Farhadi
2015 arXiv   pre-print
We introduce Segment-Phrase Table (SPT), a large collection of bijective associations between textual phrases and their corresponding segmentations.  ...  Leveraging recent progress in object recognition and natural language semantics, we show how we can successfully build a high-quality segment-phrase table using minimal human supervision.  ...  In summary, our key contributions are: (i) we motivate segment-phrase table as a convenient inter-modal correspondence representation and present a simple approach that involves minimal human supervision  ... 
arXiv:1509.08075v1 fatcat:shim4l4bzvgw3luduyujaud564

Learning visually grounded words and syntax for a scene description task

Deb K. Roy
2002 Computer Speech and Language  
Using these structures, a planning algorithm integrates syntactic, semantic, and contextual constraints to generate natural and unambiguous descriptions of objects in novel scenes.  ...  In evaluations of semantic comprehension by human judges, the performance of automatically generated spoken descriptions was comparable to human generated descriptions.  ...  In this paper we develop a learning-based approach for creating spoken language generation systems.  ... 
doi:10.1016/s0885-2308(02)00024-4 fatcat:xb6eoa3hpfao3d24dgymsdcoea
« Previous Showing results 1 — 15 out of 37,861 results