Filters








13,219 Hits in 4.4 sec

Words Matter: Scene Text for Image Classification and Retrieval

Sezer Karaoglu, Ran Tao, Theo Gevers, Arnold W. M. Smeulders
2017 IEEE transactions on multimedia  
This paper exploits textual contents in images for fine-grained business place classification and logo retrieval. There are four main contributions.  ...  Text in natural images typically adds meaning to an object or scene.  ...  Text is encoded at a word level and utilized for fine-grained classification and logo retrieval. A generic and fully unsupervised word box proposal method is proposed to detect words in images.  ... 
doi:10.1109/tmm.2016.2638622 fatcat:5einurcv2vhxhfw2vvttca4xte

Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes [article]

Satwik Kottur, Ramakrishna Vedantam, José M. F. Moura, Devi Parikh
2016 arXiv   pre-print
We show improvements over text-only word embeddings (word2vec) on three tasks: common-sense assertion classification, visual paraphrasing and text-based image retrieval.  ...  For instance, although "eats" and "stares at" seem unrelated in text, they share semantics visually. When people are eating something, they also tend to stare at the food.  ...  For instance, for common sense assertion classification and text-based image retrieval, w is a phrase from a tuple, while for visual paraphrasing w is a sentence.  ... 
arXiv:1511.07067v2 fatcat:5qplkhak35du7dt7nbmqo2em6e

VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes

Satwik Kottur, Ramakrishna Vedantam, Jose M. F. Moura, Devi Parikh
2016 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
We show improvements over text-only word embeddings (word2vec) on three tasks: common-sense assertion classification, visual paraphrasing and text-based image retrieval.  ...  For instance, although "eats" and "stares at" seem unrelated in text, they share semantics visually. When people are eating something, they also tend to stare at the food.  ...  For instance, for common sense assertion classification and text-based image retrieval, w is a phrase from a tuple, while for visual paraphrasing w is a sentence.  ... 
doi:10.1109/cvpr.2016.539 dblp:conf/cvpr/KotturVMP16 fatcat:qlkjvfy4crbb5hjpn4xtl6ebp4

Document Specific Sparse Coding for Word Retrieval

Ravi Shekhar, C.V. Jawahar
2013 2013 12th International Conference on Document Analysis and Recognition  
We have also developed text query based search solution, and we report performance comparable to image based search.  ...  We further improve the performance by defining a document specific sparse coding scheme for representing visual words (interest points) in document images.  ...  ACKNOWLEDGEMENT This work was partly supported by Ministry of Communication and Information Technology, Government of India.  ... 
doi:10.1109/icdar.2013.132 dblp:conf/icdar/ShekharJ13 fatcat:pxsd7vjphjeathhggt2t5pdeza

Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions [article]

Arjun R Akula, Spandana Gella, Yaser Al-Onaizan, Song-Chun Zhu, Siva Reddy
2020 arXiv   pre-print
identify the target object, the word order doesn't matter.  ...  We critically examine RefCOCOg, a standard benchmark for this task, using a human study and show that 83.7% of test instances do not require reasoning on linguistic structure, i.e., words are enough to  ...  Acknowledgements We would like to thank Volkan Cirik, Licheng Yu, Jiasen Lu for their help with GroundNet, MattNet and ViLBERT respectively, Keze Wang for his help with technical issues, and AWS AI data  ... 
arXiv:2005.01655v1 fatcat:4p4geps4kbcoxal2giroftmvza

Front Matter: Volume 8658

Proceedings of SPIE, Richard Zanibbi, Bertrand Coüasnon
2013 Document Recognition and Retrieval XX  
Paper Numbering: Proceedings of SPIE follow an e-First publication model, with papers published first online and then in print and on CD-ROM.  ...  Papers are published as they are submitted and meet publication criteria.  ...  Zanibbi, Rochester Institute of Technology (United States) 8658 06 NESP: Nonlinear enhancement and selection of plane for optimal segmentation and recognition of scene word images [8658-3] D.  ... 
doi:10.1117/12.2020094 fatcat:kvr4h3apybgzzcd2kqytooax6q

Handwritten English Character Recognition and translate English to Devnagari Words

Shivali Parkhedkar, Shaveri Vairagade, Vishakha Sakharkar, Bharti Khurpe, Arpita Pikalmunde, Amit Meshram, Rakesh Jambhulkar
2019 International Journal of Scientific Research in Computer Science Engineering and Information Technology  
The image of the scanned document is processed victimization the program. Each character in the word is isolated.  ...  In our proposed work we will accept the challenges of recognizing the words and we will work to win the challenge. The handwritten document is scanned using a scanner.  ...  At test time, given an image word and a text word, the model computes the probability of that text word being produced by the model when fed with the image word.  ... 
doi:10.32628/cseit19528 fatcat:rco6sq3vtncf5makybr4rehb44

Descriptive visual words and visual phrases for image applications

Shiliang Zhang, Qi Tian, Gang Hua, Qingming Huang, Shipeng Li
2009 Proceedings of the seventeen ACM international conference on Multimedia - MM '09  
In a large-scale image database containing 1506 object and scene categories, the visual words and visual word pairs descriptive to certain scenes or objects are identified as the DVWs and DVPs.  ...  The DVW and DVP combination outperforms the classic visual words by 19.5% and 80% in image retrieval and object recognition tasks, respectively.  ...  in [20] ; the spatial distribution of texton is modeled in [1] for scene classification.  ... 
doi:10.1145/1631272.1631285 dblp:conf/mm/ZhangTHHL09 fatcat:siqnpncbtvhczpp55fdk647efe

Learning to Learn Words from Visual Scenes [article]

Dídac Surís, Dave Epstein, Heng Ji, Shih-Fu Chang, Carl Vondrick
2020 arXiv   pre-print
Language acquisition is the process of learning words from the surrounding scene. We introduce a meta-learning framework that learns how to learn word representations from unconstrained scenes.  ...  We leverage the natural compositional structure of language to create training episodes that cause a meta-learner to learn strong policies for language acquisition.  ...  Acknowledgements: We thank Alireza Zareian, Bobby Wu, Spencer Whitehead, Parita Pooj and Boyuan Chen for helpful discussion. Funding for this research was provided by DARPA GAILA HR00111990058.  ... 
arXiv:1911.11237v3 fatcat:hveh5cjwzjdgzmseg32uefslju

Front Matter: Volume 11373

Zhigeng Pan, Xun Wang
2020 Eleventh International Conference on Graphics and Image Processing (ICGIP 2019)  
Publication of record for individual papers is online in the SPIE Digital Library. SPIEDigitalLibrary.org Paper Numbering: Proceedings of SPIE follow an e-First publication model.  ...  Utilization of CIDs allows articles to be fully citable as soon as they are published online, and connects the same identifier to all online and print versions of the publication.  ...  based on deep learning 11373 0F Research on the text detection and recognition in natural scene images SESSION 2 TARGET DETECTION 0G Saliency detection based on non-local neural networks in low-contrast  ... 
doi:10.1117/12.2561685 fatcat:5dwsd2oxjjcdllprs6rtum4tfu

Front Matter: Volume 10341

2017 Ninth International Conference on Machine Vision (ICMV 2016)  
Some conference presentations may not be available for publication. Additional papers and presentation recordings may be available online in the SPIE Digital Library at SPIEDigitalLibrary.org.  ...  The publisher is not responsible for the validity of the information or for any outcomes resulting from reliance thereon. Please use the following format to cite material from these proceedings:  ...  for text detection and recognition in indoor scene for assisting blind people [10341-2] 10341 24 Fast adaptive matting based on iterative solution [10341-17] 10341 25 Atmospheric correction of  ... 
doi:10.1117/12.2276832 dblp:conf/icmv/X16 fatcat:srr4hyfwpfcipadcvjc5jdll6i

TextProposals: a Text-specific Selective Search Algorithm for Word Spotting in the Wild [article]

Lluis Gomez-Bigorda, Dimosthenis Karatzas
2017 arXiv   pre-print
In this paper we introduce a novel object proposals method that is specifically designed for text.  ...  Moreover, the combination of our object proposals with existing whole-word recognizers shows competitive performance in end-to-end word spotting, and, in some benchmarks, outperforms previously published  ...  Acknowledgment This project was supported by the Spanish project TIN2014-52072-P, the fellowship RYC-2009-05031, and the Catalan government scholarship 2014FI B1-0017.  ... 
arXiv:1604.02619v3 fatcat:bnerugqj6nclxaueapd2tgjzqy

Neural Word Search in Historical Manuscript Collections [article]

Tomas Wilkinson, Jonas Lindström, Anders Brun
2020 arXiv   pre-print
We address the problem of segmenting and retrieving word images in collections of historical manuscripts given a text query. This is commonly referred to as "word spotting".  ...  Given the time consuming manual work required to study old manuscripts in the humanities, quick and robust tools for word spotting has the potential to revolutionise domains like history, religion and  ...  ACKNOWLEDGMENTS This project is a part of q2b, From quill to bytes, which is a digital humanities initiative sponsored by the Swedish Research Council (Dnr 2012-5743), Riksbankens Jubileumsfond (NHS14-2068:1) and  ... 
arXiv:1812.02771v2 fatcat:f4w3b3ee35favbxrr4pczyqpzq

Information Extraction: The Power of Words and Pictures

Marie-Francine Moens
2007 Information Technology Interfaces  
A number of challenging and emerging research directions are enumerated and illustrated with results obtained by the research group of the author.  ...  The paper stresses the importance of automatically analyzing and semantically annotating creative forms of human expression, among which are textual sources.  ...  Acknowledgements We are very grateful to the organizations that sponsored the research projects mentioned: ACILA (Automatic Detection and Classification of Arguments in a Legal Case), K.  ... 
doi:10.1109/iti.2007.4283737 fatcat:2ajmmbxndfe5vlm6ppgbeinkqi

Information Extraction: The Power of Words and Pictures

Marie-Francine Moens
2007 Journal of Computing and Information Technology  
A number of challenging and emerging research directions are enumerated and illustrated with results obtained by the research group of the author.  ...  The paper stresses the importance of automatically analyzing and semantically annotating creative forms of human expression, among which are textual sources.  ...  Acknowledgements We are very grateful to the organizations that sponsored the research projects mentioned: ACILA (Automatic Detection and Classification of Arguments in a Legal Case), K.  ... 
doi:10.2498/cit.1001136 fatcat:tfpcm22xdranzmo6uo2sdlk7ya
« Previous Showing results 1 — 15 out of 13,219 results