Filters








1,977 Hits in 5.8 sec

Scene Text Synthesis for Efficient and Effective Deep Network Training [article]

Fangneng Zhan, Hongyuan Zhu, Shijian Lu
2019 arXiv   pre-print
A large amount of annotated training images is critical for training accurate and robust deep network models but the collection of a large amount of annotated training images is often time-consuming and  ...  similar or even better scene text detection and scene text recognition performance as compared with using real images.  ...  Scene Text Detection Implementation For the scene text detection task, we adopt an adapted EAST model [58] to train all text detectors.  ... 
arXiv:1901.09193v1 fatcat:rqtkjb6qgnag3de3zc2cwzcqpm

A multi-modal system for the retrieval of semantic video events

Arnon Amir, Sankar Basu, Giridharan Iyengar, Ching-Yung Lin, Milind Naphade, John R. Smith, Savitha Srinivasan, Belle Tseng
2004 Computer Vision and Image Understanding  
These classifiers are used to automatically annotate video with semantic labels, which in turn are used to search for new, untrained types of events and semantic concepts.  ...  A framework for event detection is proposed where events, objects, and other semantic concepts are detected from video using trained classifiers.  ...  Acknowledgments We are very grateful to Paul Over and Ramazan Taban, NIST, for organizing the video track.  ... 
doi:10.1016/j.cviu.2004.02.006 fatcat:enuxpfgaxbggphy6g2fwsgdmti

Semantic video content annotation at the object level

Vanessa El-Khoury, Martin Jergler, David Coquil, Harald Kosch
2012 Proceedings of the 10th International Conference on Advances in Mobile Computing & Multimedia - MoMM '12  
SVCAT is a semi-automatic annotation tool compliant with the MPEG-7 standard, which produces metadata according to an object-based video content model described in this paper.  ...  To address these shortcomings, we propose the Semantic Video Content Annotation Tool (SVCAT) for structural and high-level semantic annotation.  ...  Particularly, it achieves a semi-automatic annotation at the object level.  ... 
doi:10.1145/2428955.2428991 dblp:conf/momm/El-KhouryJCK12 fatcat:kv7dwlyf7fdcvic5pzxntx36pu

NEIL: Extracting Visual Knowledge from Web Data

Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta
2013 2013 IEEE International Conference on Computer Vision  
NEIL uses a semi-supervised learning algorithm that jointly discovers common sense relationships (e.g., "Corolla is a kind of/looks similar to Car","Wheel is a part of Car") and labels instances of the  ...  As of 10 th October 2013, NEIL has been continuously running for 2.5 months on 200 core cluster (more than 350K CPU hours) and has an ontology of 1152 object categories, 1034 scene categories and 87 attributes  ...  Acknowledgements: This research was supported by ONR MURI N000141010934 and a gift from Google. The authors would like to thank Tom Mitchell and David Fouhey for insightful discussions.  ... 
doi:10.1109/iccv.2013.178 dblp:conf/iccv/ChenSG13 fatcat:qks2a3nkanf5vabuqxehikj7ee

I2T: Image Parsing to Text Description

Benjamin Z Yao, Xiong Yang, Liang Lin, Mun Wai Lee, Song-Chun Zhu
2010 Proceedings of the IEEE  
is a formal and unambiguous knowledge representation. 3) A text generation engine converts the semantic representation into a semantically meaningful, human readable and query-able text report.  ...  The first one is a visual knowledge base that provides top-down hypotheses for image parsing and serves as an image ontology for translating parse graphs into semantic representations.  ...  as scene classification, object detection, aerial image understanding, text detection and recognition as well as low level tasks such as edge detection and edge attribute annotation.  ... 
doi:10.1109/jproc.2010.2050411 fatcat:efostazxbrhghmtctxr6uoqdfu

Author Index

2010 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition  
3D Deformable Surface Reconstruction Torsello, Andrea A Game-Theoretic Approach to Fine Surface Registration without Initial Motion Estimation Object Detection via Boundary Structure Segmentation Detecting  ...  Object and Human Pose in Human-Object Interaction Activities Connecting Modalities: Semi-supervised Segmentation and Annotation of Images Using Unaligned Text Corpora Felzenszwalb, Pedro F.  ... 
doi:10.1109/cvpr.2010.5539913 fatcat:y6m5knstrzfyfin6jzusc42p54

Unsupervised Approaches for Textual Semantic Annotation, A Survey

Xiaofeng Liao, Zhiming Zhao
2019 ACM Computing Surveys  
Link to publication Creative Commons License (see https://creativecommons.org/use-remix/cc-licenses): CC BY Citation for published version (APA):  ...  ACKNOWLEDGMENTS The authors thank the anonymous reviewers for their helpful comments, in addition to Cees de Laat, Paul Martin, Jayachander Surbiryala, and ZeShun Shi for useful discussions.  ...  Yes GoNTogle 2010 Semi-automatic, supervised User annotation history is used for model training.  ... 
doi:10.1145/3324473 fatcat:fg5ucwtloze6ljdlh4hqjkqxfe

State of the Art: A Summary of Semantic Image and Video Retrieval Techniques

S. Suguna, C. Ranjith Kumar, D. Sheela Jeyarani
2015 Indian Journal of Science and Technology  
Due to these reasons semantic video retrieval became a challenging issue in various industries.  ...  The survey of many methods leads to an introduction of combination of a best method.  ...  Third the new model is proposed for cross model to solve the problems based upon the hypothesis, Content Matching (CM), Semantic Matching (SM) and Semantic Correlation Matching (SCM).  ... 
doi:10.17485/ijst/2015/v8i35/77061 fatcat:2htopyojqjd7bkjt6mx66cf24i

System design for structured hypermedia generation [chapter]

Marcel Worring, Carel van den Berg, Lynda Hardman, Audrey Tam
1997 Lecture Notes in Computer Science  
In this contribution we consider the design of a hypermedia information system that not only includes standard functionality of storage and presentation, but also the automatic generation of hypermedia  ...  presentations on the basis of a domain dependent k n o wledge base.  ...  In this step, each component is semi-automatically assigned a semantic annotation.  ... 
doi:10.1007/3-540-63636-6_15 fatcat:6kegqoguhremfil7qekpnwpj7y

A Generic Framework for Video Annotation via Semi-Supervised Learning

Tianzhu Zhang, Changsheng Xu, Guangyu Zhu, Si Liu, Hanqing Lu
2012 IEEE transactions on multimedia  
In this paper, we propose a novel approach based on semi-supervised learning by means of information from the Internet for interesting event annotation in videos.  ...  Concretely, a Fast Graph-based Semi-Supervised Multiple Instance Learning (FGSSMIL) algorithm, which aims to simultaneously tackle these difficulties in a generic framework for various video domains (e.g  ...  A Generic Framework for Video Annotation via Semi-Supervised Learning I.  ... 
doi:10.1109/tmm.2012.2191944 fatcat:7uaujwzq4nfrto5jaim7bf4ify

Extracting Semantics from Multimedia Content: Challenges and Solutions [chapter]

Lexing Xie, Rong Yan
2008 Signals and Communication Technology  
We start with an system overview with the five major components that extracts and uses semantic metadata: data annotation, multimedia ontology, feature representation, model learning and retrieval systems  ...  In this chapter, we present a review on extracting semantics from a large amount of multimedia data as a statistical learning problem.  ...  and associated text keywords via a directed graphical model.  ... 
doi:10.1007/978-0-387-76569-3_2 fatcat:jul6fw7esfaurct6erjnvpcq6q

A Dataset for Lane Instance Segmentation in Urban Environments [chapter]

Brook Roberts, Sebastian Kaltwang, Sina Samangooei, Mark Pender-Bare, Konstantinos Tertikas, John Redford
2018 Lecture Notes in Computer Science  
Therefore, we propose a semi-automated method that allows for efficient labelling of image sequences by utilising an estimated road plane in 3D based on where the car has driven and projecting labels from  ...  We notice that driving the car is itself a form of annotation.  ...  Acknowledgements We would like to thank our colleagues Tom Westmacott, Joel Jakubovic and Robert Chandler, who have contributed to the implementation of the annotation software.  ... 
doi:10.1007/978-3-030-01237-3_33 fatcat:luvvq5dhhzbzddxirrkbwac5ma

A Dataset for Lane Instance Segmentation in Urban Environments [article]

Brook Roberts, Sebastian Kaltwang, Sina Samangooei, Mark Pender-Bare, Konstantinos Tertikas, John Redford
2018 arXiv   pre-print
Therefore, we propose a semi-automated method that allows for efficient labelling of image sequences by utilising an estimated road plane in 3D based on where the car has driven and projecting labels from  ...  We notice that driving the car is itself a form of annotation.  ...  Acknowledgements We would like to thank our colleagues Tom Westmacott, Joel Jakubovic and Robert Chandler, who have contributed to the implementation of the annotation software.  ... 
arXiv:1807.01347v2 fatcat:audqzcv5ardgvnn6h5u3criime

Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention

Georgios Evangelopoulos, Athanasia Zlatintsi, Alexandros Potamianos, Petros Maragos, Konstantinos Rapantzikos, Georgios Skoumas, Yannis Avrithis
2013 IEEE transactions on multimedia  
Detection of attention-invoking audiovisual segments is formulated in this work on the basis of saliency models for the audio, visual, and textual information conveyed in a video stream.  ...  Different fusion schemes are evaluated on a movie database of multimodal saliency annotations with comparative results provided across modalities.  ...  Rodomagoulakis for the additional movie annotations, T. Apostolidis for providing the expert movie summaries, and the anonymous reviewers for their suggestions towards improving this paper.  ... 
doi:10.1109/tmm.2013.2267205 fatcat:jjt7xmjh5narlm5wr2strvrqza

What Are You Talking About? Text-to-Image Coreference

Chen Kong, Dahua Lin, Mohit Bansal, Raquel Urtasun, Sanja Fidler
2014 2014 IEEE Conference on Computer Vision and Pattern Recognition  
We show that our approach significantly improves 3D detection and scene classification accuracy, and is able to reliably estimate the text-to-image alignment.  ...  Towards this goal, we propose a structure prediction model that exploits potentials computed from text and RGB-D imagery to reason about the class of the 3D objects, the scene type, as well as to align  ...  Text can help us parse the visual scene in a more informed way, and can facilitate for example new ways of active labeling and learning.  ... 
doi:10.1109/cvpr.2014.455 dblp:conf/cvpr/KongLBUF14 fatcat:gbujfihrh5hazloq4cccnjfmiu
« Previous Showing results 1 — 15 out of 1,977 results