Filters








16,449 Hits in 4.8 sec

Discovering Phrase-Level Lexicon for Image Annotation [chapter]

Lei Yu, Jing Liu, Changsheng Xu
2010 Lecture Notes in Computer Science  
In image annotation, the annotation words are expected to represent image content at both visual level and semantic level.  ...  In this paper, we attempt to find this kind of combination and construct a less ambiguous phrase-level lexicon for annotation.  ...  Related Work Extensive research efforts have been devoted to automatic image annotation in recent years.  ... 
doi:10.1007/978-3-642-15702-8_17 fatcat:akbbvm7lb5evdbxhyqpkp6adky

DBRIS at ImageCLEF 2012 Photo Annotation Task

Magdalena Rischka, Stefan Conrad
2012 Conference and Labs of the Evaluation Forum  
For our participation in the ImageCLEF 2012 Photo Annotation Task we develope an image annotation system and test several combinations of SIFT-based descriptors with bow-based image representations.  ...  Our focus is on the comparison of two image representation types which include spatial layout: the spatial pyramids and the visual phrases.  ...  Our automatic image annotation system bases only on visual features.  ... 
dblp:conf/clef/RischkaC12 fatcat:auqwjrwnprgljipsyzlxcchrxa

From Subcategories to Visual Composites: A Multi-level Framework for Object Detection

Tian Lan, Michalis Raptis, Leonid Sigal, Greg Mori
2013 2013 IEEE International Conference on Computer Vision  
We then develop a structured model for object detection that captures interactions among object subcategories and automatically discovers semantically meaningful and discriminatively relevant visual composites  ...  We propose a weakly-supervised framework for object detection where we discover subcategories and the composites automatically with only traditional object-level category labels as input.  ...  [15] manually annotate a list of visual phrases and train global phrase templates for detection.  ... 
doi:10.1109/iccv.2013.53 dblp:conf/iccv/LanRSM13 fatcat:z43bact5mvhl5myp3nlcbx33e4

Less is More: Generating Grounded Navigation Instructions from Landmarks [article]

Su Wang, Ceslee Montgomery, Jordi Orbay, Vighnesh Birodkar, Aleksandra Faust, Izzeddin Gur, Natasha Jaques, Austin Waters, Jason Baldridge, Peter Anderson
2022 arXiv   pre-print
We study the automatic generation of navigation instructions from 360-degree images captured on indoor routes.  ...  To train it, we bootstrap grounded landmark annotations on top of the Room-across-Room (RxR) dataset.  ...  Acknowledgements We thank Ming Zhao, Subhashini Venugopalan, and Alex Ku for early discussions and brainstorming; Yinfei Yang, Chao Jia and Aashi Jain for help with MURAL image features; Sebastian Goodman  ... 
arXiv:2111.12872v4 fatcat:ictdllrge5fcrhfqlztzkq6ovq

Multi-Modal Word Synset Induction

Jesse Thomason, Raymond J. Mooney
2017 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence  
Given pairs of images and text with noun phrase labels, we perform synset induction to produce collections of underlying concepts described by one or more noun phrases.  ...  We find that considering multi-modal features from both visual and textual context yields better induced synsets than using either context alone.  ...  Acknowledgments We would like to thank our anonymous reviewers for their feedback and insights and Subhashini Venugopalan for her help in engineering deep visual feature extraction.  ... 
doi:10.24963/ijcai.2017/575 dblp:conf/ijcai/ThomasonM17 fatcat:xea3e5jodramlfq5tchotqwxba

Web-Based Semantic Fragment Discovery for On-Line Lingual-Visual Similarity

Xiaoshuai Sun, Jiewei Cao, Chao Li, Lei Zhu, Heng Tao Shen
2017 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
visual representation which automatically links generic lingual phrases to their related visual contents.  ...  In this paper, we present an automatic approach for on-line discovery of visual-lingual semantic fragments from weakly labeled Internet images.  ...  Flickr30K-Quality We sample 20K images with 1K phrase labels from Flickr30K-Phrase, and ask in-house annotators to score the quality of each phrase.  ... 
doi:10.1609/aaai.v31i1.10490 fatcat:uwaj6jld6zhoxbvgeqjcjuvik4

PhraseCut: Language-based Image Segmentation in the Wild [article]

Chenyun Wu, Zhe Lin, Scott Cohen, Trung Bui, Subhransu Maji
2020 arXiv   pre-print
Our dataset is collected on top of the Visual Genome dataset and uses the existing annotations to generate a challenging set of referring phrases for which the corresponding regions are manually annotated  ...  We consider the problem of segmenting image regions given a natural language phrase, and study it on a novel dataset of 77,262 images and 345,486 phrase-region pairs.  ...  Our dataset leverages the annotations in the Visual Genome (VG) dataset [18] to generate a large set of referring phrases for each image.  ... 
arXiv:2008.01187v1 fatcat:s6h5b3uehraofldn2mnansqfaa

Extracting Visual Knowledge from the Web with Multimodal Learning

Dihong Gong, Daisy Zhe Wang
2017 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence  
To overcome the deficiency of pure visual techniques, we propose to make use of meta text surrounding images on the Web for enhanced detection accuracy.  ...  We consider the problem of automatically extracting visual objects from web images. Despite the extraordinary advancement in deep learning, visual object detection remains a challenging task.  ...  Image Tagging The image tagging program automatically assigns each image a set of noun phrases (tags) that best describe the image [Chen et al., 2013a] .  ... 
doi:10.24963/ijcai.2017/238 dblp:conf/ijcai/GongW17 fatcat:mfnib6fy7vhb3ilyv5ifytr52m

Black Holes and White Rabbits: Metaphor Identification with Visual Features

Ekaterina Shutova, Douwe Kiela, Jean Maillard
2016 Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies  
Metaphor is pervasive in our communication, which makes it an important problem for natural language processing (NLP).  ...  In this paper, we present the first metaphor identification method that simultaneously draws knowledge from linguistic and visual data.  ...  Acknowledgment We are grateful to the NAACL reviewers for their helpful feedback. Ekaterina Shutova's research is supported by the Leverhulme Trust Early Career Fellowship.  ... 
doi:10.18653/v1/n16-1020 dblp:conf/naacl/ShutovaKM16 fatcat:ifpzb3osobbb5kaj7xn32hdjiy

From Image Annotation to Image Description [chapter]

Ankush Gupta, Prashanth Mannem
2012 Lecture Notes in Computer Science  
In this paper, we address the problem of automatically generating a description of an image from its annotation.  ...  With this motivation, we present an approach to generate image descriptions from image annotation and show that with accurate object and attribute detection, human-like descriptions can be generated.  ...  Automatic image annotation [1] [2] is useful in various applications like image indexing, image retrieval, search engine optimization and increasing accessibility to visually impaired users.  ... 
doi:10.1007/978-3-642-34500-5_24 fatcat:u6byablayzdltihqwqdbym2d3y

Learning Language-Visual Embedding for Movie Understanding with Natural-Language [article]

Atousa Torabi, Niket Tandon, Leonid Sigal
2016 arXiv   pre-print
This test facilitate automatic evaluation of visual-language models for natural language video annotation based on human activities.  ...  Learning a joint language-visual embedding has a number of very appealing properties and can result in variety of practical application, including natural language image/video annotation and search.  ...  In this work we study two tasks: 1) Standard Ranking for video annotation and retrieval 2) Multiple-choice test, which enable us to automatically evaluate joint language-visual model based on precise matrics  ... 
arXiv:1609.08124v1 fatcat:4nzwhzuvizej3e7ll36gq5pvie

Automatic discovery of groups of objects for scene understanding

Congcong Li, D. Parikh, Tsuhan Chen
2012 2012 IEEE Conference on Computer Vision and Pattern Recognition  
A key observation is that these interactions manifest themselves as predictable visual patterns in the image.  ...  Hence, we propose an algorithm that automatically discovers groups of arbitrary numbers of participating objects from a collection of images labeled with object categories.  ...  Secondly, we propose an algorithm to automatically discover these groups from images annotated only with object labels.  ... 
doi:10.1109/cvpr.2012.6247996 dblp:conf/cvpr/LiPC12 fatcat:4yhdwcnqifehxp4ozesmzflhje

Automatic annotation of unique locations from video and text

Chris Engels, Koen Deschacht, Jan Hendrik Becker, Tinne Tuytelaars, Sien Moens, Luc Van Gool
2010 Procedings of the British Machine Vision Conference 2010  
We apply this scheme to location annotation of a television series for which transcripts are available.  ...  Given a video and associated text, we propose an automatic annotation scheme in which we employ a latent topic model to generate topic distributions from weighted text and then modify these distributions  ...  Acknowledgments We wish to thank Marcin Eichner and Vittorio Ferrari for their assistance with creating pose estimations.  ... 
doi:10.5244/c.24.115 dblp:conf/bmvc/EngelsDBTMG10 fatcat:prnq7d37kzd7jilvfabp3ynrsq

Including Keyword Position in Image-based Models for Act Segmentation of Historical Registers [article]

Mélodie Boillet, Martin Maarand, Thierry Paquet, Christopher Kermorvant
2021 arXiv   pre-print
We propose a simple pipeline to enrich document images with the position of text lines containing key-phrases and show that running a standard image-based layout analysis system on these images can lead  ...  In this paper, we focus on the use of both visual and textual information for segmenting historical registers into structured and meaningful units such as acts.  ...  ; (5) Using the image augmented with key-phrase line positions as an input for a deep neural network for semantic image segmentation.  ... 
arXiv:2109.08477v1 fatcat:xlvlhs6fjnhddntzihwuxarym4

Charon: a FrameNet Annotation Tool for Multimodal Corpora [article]

Frederico Belcavello, Marcelo Viridiano, Ely Edison Matos, Tiago Timponi Torrent
2022 arXiv   pre-print
Annotation can be made for corpora containing both static images and video sequences paired - or not - with text sequences.  ...  This paper presents Charon, a web tool for annotating multimodal corpora with FrameNet categories.  ...  annotation module, where they visualize both the annotated sentences and the automatically detected objects.  ... 
arXiv:2205.11836v1 fatcat:ouno77xtpjbfthny75efgf5n5i
« Previous Showing results 1 — 15 out of 16,449 results