Filters








1,409 Hits in 5.6 sec

Enforcing similarity constraints with integer programming for better scene text recognition

David L. Smith, Jacqueline Field, Erik Learned-Miller
2011 CVPR 2011  
Weinman and Learned-Miller [14] showed that the similarity among characters, as a supplement to the appearance of the characters with respect to a model, could be used to improve scene text recognition  ...  In this work, we make further improvements to scene text recognition by taking a novel approach to the incorporation of similarity.  ...  Acknowledgements The authors thank Jerod Weinman for the precomputed feature data and several helpful discussions. J. Feild was supported by an NSF Graduate Research Fellowship.  ... 
doi:10.1109/cvpr.2011.5995700 dblp:conf/cvpr/SmithFL11 fatcat:f4odaudskfefjmlk4wdetjqehq

Combining Per-frame and Per-track Cues for Multi-person Action Recognition [chapter]

Sameh Khamis, Vlad I. Morariu, Larry S. Davis
2012 Lecture Notes in Computer Science  
We propose a model to combine per-frame and per-track cues for action recognition.  ...  With multiple targets in a scene, our model simultaneously captures the natural harmony of an individual's action in a scene and the flow of actions of an individual in a video sequence, inferring valid  ...  Consequently, we can relax the binary constraint to an interval constraint and still guarantee an integer solution to the linear program. We therefore use a fast interior-point solver.  ... 
doi:10.1007/978-3-642-33718-5_9 fatcat:hji47lhygfbgvdymul2pv4fuui

Linking People in Videos with "Their" Names Using Coreference Resolution [chapter]

Vignesh Ramanathan, Armand Joulin, Percy Liang, Li Fei-Fei
2014 Lecture Notes in Computer Science  
What is needed are models that can reason over uncertainty over both videos and text.  ...  We develop a joint model for person naming and coreference resolution, and in the process, infer a latent alignment between tracks and mentions.  ...  Yeung for helpful comments and feedback. This research is partially supported by Intel, the NFS grant IIS-1115493 and DARPA-Mind's Eye grant.  ... 
doi:10.1007/978-3-319-10590-1_7 fatcat:zmx42zoh5vbctptpy54jvdfpla

Automatic Detection and Inpainting of Text Images

S. Bhuvaneswari, T. S. Subashini
2013 International Journal of Computer Applications  
The detected text region is then inpainted using fast marching algorithm which uses the pixel information that is present in the non-text region of the image for inpainting the detected text region.  ...  The proposed system detects text using connect component labelling and a set of selection/ rejection criteria which helps to retain the text region alone.  ...  Learned-Miller, " Enforcing similarity constraints with integer programming for better scene text recognition" , CVPR, pp.73-80, 2011. Fig 1: Inpainting process Fig 1.  ... 
doi:10.5120/9941-4578 fatcat:k3kvwodkk5cdxoecmrakb5wxq4

Optimal Reduction of Large Image Databases for Location Recognition

Michal Havlena, Wilfried Hartmann, Konrad Schindler
2013 2013 IEEE International Conference on Computer Vision Workshops  
We show how the minimum DS can nevertheless be solved to global optimality efficiently in practice, by formulating it as an integer linear program (ILP).  ...  For some computer vision tasks, such as location recognition on mobile devices or Structure from Motion (SfM) computation from Internet photo collections, one wants to reduce a large set of images to a  ...  Acknowledgements The authors would like to thank Vojtěch Franc for discussions about the ILP formulation of the CDS problem, and all Flickr users whose images were used in the project.  ... 
doi:10.1109/iccvw.2013.93 dblp:conf/iccvw/HavlenaHS13 fatcat:cwvywg6p7bfxhd4auy5okayywm

TreeTalk: Composition and Compression of Trees for Image Descriptions

Polina Kuznetsova, Vicente Ordonez, Tamara L. Berg, Yejin Choi
2014 Transactions of the Association for Computational Linguistics  
Our proposed system attains significantly better performance than previous approaches for both image caption generalization and generation.  ...  Key algorithmic components are tree composition and compression, both integrating tree structure with sequence structure.  ...  The problem of choosing phrase order together with the best parse tree of the description is a complex optimization problem, which we solve using Integer Linear Programming (ILP).  ... 
doi:10.1162/tacl_a_00188 fatcat:pv7me3oioravnn2m6wfvovbpzy

Multimodal Machine Learning: A Survey and Taxonomy [article]

Tadas Baltrušaitis, Chaitanya Ahuja, Louis-Philippe Morency
2017 arXiv   pre-print
This new taxonomy will enable researchers to better understand the state of the field and identify directions for future research.  ...  It is a vibrant multi-disciplinary field of increasing importance and with extraordinary potential.  ...  [114] first retrieve phrases that describe visually similar images and then combine them to generate novel descriptions of the query image by using Integer Linear Programming with a number of hand crafted  ... 
arXiv:1705.09406v2 fatcat:262fo4sihffvxecg4nwsifoddm

Class consistent multi-modal fusion with binary features

Ashish Shrivastava, Mohammad Rastegari, Sumit Shekhar, Rama Chellappa, Larry S. Davis
2015 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
We enforce this perturbation to be as small as possible via a quadratic program (QP) for continuous features, and a mixed integer program (MIP) for binary features.  ...  Many existing recognition algorithms combine different modalities based on training accuracy but do not consider the possibility of noise at test time.  ...  We develop this intuition into an optimization problem that can be solved via quadratic programming for continuous features, and mixed integer programming for binary features.  ... 
doi:10.1109/cvpr.2015.7298841 dblp:conf/cvpr/ShrivastavaRSCD15 fatcat:xyojwbucmvevdimjd7o6o77y7e

Long-Term Identity-Aware Multi-Person Tracking for Surveillance Video Summarization [article]

Shoou-I Yu, Yi Yang, Xuanchong Li, Alexander G. Hauptmann
2017 arXiv   pre-print
We tested our algorithm on a 23-day 15-camera data set (4,935 hours total), and we were able to localize a person 53.2% of the time with 69.8% precision.  ...  Therefore, we propose a multi-person tracking algorithm for very long-term (e.g. month-long) multi-camera surveillance scenarios.  ...  Many trackers have been formulated as a general Integer Linear Programming (ILP) problem.  ... 
arXiv:1604.07468v2 fatcat:glk4uwjsyvgo5cjvr3pclp32si

Baby talk: Understanding and generating simple image descriptions

Girish Kulkarni, Visruth Premraj, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, Tamara L Berg
2011 CVPR 2011  
The first part, content planning, smooths the output of computer vision-based detection and recognition algorithms with statistics mined from large pools of visually descriptive text to determine the best  ...  We present multiple approaches for the surface realization step and evaluate each using automatic measures of similarity to human generated reference descriptions.  ...  Finally, in this paper, we formulate the optimization as an integer linear program, dictating the form of the objective function and constraints.  ... 
doi:10.1109/cvpr.2011.5995466 dblp:conf/cvpr/KulkarniPDLCBB11 fatcat:dow4w2mterdsbavtecuxjfd6qq

BabyTalk: Understanding and Generating Simple Image Descriptions

Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, Tamara L. Berg
2013 IEEE Transactions on Pattern Analysis and Machine Intelligence  
The first part, content planning, smooths the output of computer vision-based detection and recognition algorithms with statistics mined from large pools of visually descriptive text to determine the best  ...  We present multiple approaches for the surface realization step and evaluate each using automatic measures of similarity to human generated reference descriptions.  ...  Finally, in this paper, we formulate the optimization as an integer linear program, dictating the form of the objective function and constraints.  ... 
doi:10.1109/tpami.2012.162 pmid:22848128 fatcat:qhye4obzpbcllos2dr6rohli2u

Multimodal Image Outpainting with Regularized Normalized Diversification

Lingzhi Zhang, Jiancong Wang, Jianbo Shi
2020 2020 IEEE Winter Conference on Applications of Computer Vision (WACV)  
While recent approaches [32, 28] propose to maximize or preserve the pairwise distance between generated samples with respect to their latent distance, they do not explicitly prevent the diverse samples  ...  Figure 1 : Given only a small foreground region, our model can learn to outpaint a set of diverse and plausible missing backgrounds in both face image and street scene image.  ...  Acknowledgement We gratefully appreciate the support from Honda Research Institute Curious Minded Machine Program. We also gratefully acknowledge a GPU donation from NVIDIA.  ... 
doi:10.1109/wacv45572.2020.9093636 dblp:conf/wacv/ZhangWS20a fatcat:4tiqyfsvvjclroesqugdbibsii

Person instance graphs for mono-, cross- and multi-modal person recognition in multimedia data: application to speaker identification in TV broadcast

Hervé Bredin, Anindya Roy, Viet-Bac Le, Claude Barras
2014 International Journal of Multimedia Information Retrieval  
This work introduces a unified framework for mono-, cross-and multi-modal person recognition in multimedia data.  ...  It relies on Integer Linear Programming to model the problem of clustering person instances based on their identity. We provide an indepth theoretical definition of the optimization problem.  ...  Acknowledgements This work was partly realized as part of the Quaero Program and the QCompere project, respectively funded by OSEO (French State agency for innovation) and ANR (French national research  ... 
doi:10.1007/s13735-014-0055-y fatcat:mlvyk5h5v5c4pmo4nvvljqs5ga

On support relations and semantic scene graphs

Michael Ying Yang, Wentong Liao, Hanno Ackermann, Bodo Rosenhahn
2017 ISPRS journal of photogrammetry and remote sensing (Print)  
Scene understanding is a popular and challenging topic in both computer vision and photogrammetry. Scene graph provides rich information for such scene understanding.  ...  In contrast to previous methods for extracting support relations, the proposed approach generates more accurate results, and does not require a pixel-wise semantic labeling of the scene.  ...  Energy minimization The minimization of the energy function Eq. (5) can be formulated as an integer programming problem.  ... 
doi:10.1016/j.isprsjprs.2017.07.010 fatcat:suxv7piwbrg4ln24wxfv3exraq

Orientation Robust Text Line Detection in Natural Images

Le Kang, Yi Li, David Doermann
2014 2014 IEEE Conference on Computer Vision and Pattern Recognition  
Then, higherorder correlation clustering (HOCC) is used to partition the MSERs into text line candidates, using the hypotheses as soft constraints to enforce long range interactions.  ...  In this paper, higher-order correlation clustering (HOCC) is used for text line detection in natural images.  ...  The original HOCC proposed a linear programming relaxation solution with a large number of inequality constraints. This complex linear system can be written elegantly in the SDP framework.  ... 
doi:10.1109/cvpr.2014.514 dblp:conf/cvpr/KangLD14 fatcat:jeqnmviizngmdkpa5zyfwj7sdy
« Previous Showing results 1 — 15 out of 1,409 results