A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Enforcing similarity constraints with integer programming for better scene text recognition
2011
CVPR 2011
Weinman and Learned-Miller [14] showed that the similarity among characters, as a supplement to the appearance of the characters with respect to a model, could be used to improve scene text recognition ...
In this work, we make further improvements to scene text recognition by taking a novel approach to the incorporation of similarity. ...
Acknowledgements The authors thank Jerod Weinman for the precomputed feature data and several helpful discussions. J. Feild was supported by an NSF Graduate Research Fellowship. ...
doi:10.1109/cvpr.2011.5995700
dblp:conf/cvpr/SmithFL11
fatcat:f4odaudskfefjmlk4wdetjqehq
Combining Per-frame and Per-track Cues for Multi-person Action Recognition
[chapter]
2012
Lecture Notes in Computer Science
We propose a model to combine per-frame and per-track cues for action recognition. ...
With multiple targets in a scene, our model simultaneously captures the natural harmony of an individual's action in a scene and the flow of actions of an individual in a video sequence, inferring valid ...
Consequently, we can relax the binary constraint to an interval constraint and still guarantee an integer solution to the linear program. We therefore use a fast interior-point solver. ...
doi:10.1007/978-3-642-33718-5_9
fatcat:hji47lhygfbgvdymul2pv4fuui
Linking People in Videos with "Their" Names Using Coreference Resolution
[chapter]
2014
Lecture Notes in Computer Science
What is needed are models that can reason over uncertainty over both videos and text. ...
We develop a joint model for person naming and coreference resolution, and in the process, infer a latent alignment between tracks and mentions. ...
Yeung for helpful comments and feedback. This research is partially supported by Intel, the NFS grant IIS-1115493 and DARPA-Mind's Eye grant. ...
doi:10.1007/978-3-319-10590-1_7
fatcat:zmx42zoh5vbctptpy54jvdfpla
Automatic Detection and Inpainting of Text Images
2013
International Journal of Computer Applications
The detected text region is then inpainted using fast marching algorithm which uses the pixel information that is present in the non-text region of the image for inpainting the detected text region. ...
The proposed system detects text using connect component labelling and a set of selection/ rejection criteria which helps to retain the text region alone. ...
Learned-Miller, " Enforcing similarity constraints with integer programming for better scene text recognition" , CVPR, pp.73-80, 2011. Fig 1: Inpainting process Fig 1. ...
doi:10.5120/9941-4578
fatcat:k3kvwodkk5cdxoecmrakb5wxq4
Optimal Reduction of Large Image Databases for Location Recognition
2013
2013 IEEE International Conference on Computer Vision Workshops
We show how the minimum DS can nevertheless be solved to global optimality efficiently in practice, by formulating it as an integer linear program (ILP). ...
For some computer vision tasks, such as location recognition on mobile devices or Structure from Motion (SfM) computation from Internet photo collections, one wants to reduce a large set of images to a ...
Acknowledgements The authors would like to thank Vojtěch Franc for discussions about the ILP formulation of the CDS problem, and all Flickr users whose images were used in the project. ...
doi:10.1109/iccvw.2013.93
dblp:conf/iccvw/HavlenaHS13
fatcat:cwvywg6p7bfxhd4auy5okayywm
TreeTalk: Composition and Compression of Trees for Image Descriptions
2014
Transactions of the Association for Computational Linguistics
Our proposed system attains significantly better performance than previous approaches for both image caption generalization and generation. ...
Key algorithmic components are tree composition and compression, both integrating tree structure with sequence structure. ...
The problem of choosing phrase order together with the best parse tree of the description is a complex optimization problem, which we solve using Integer Linear Programming (ILP). ...
doi:10.1162/tacl_a_00188
fatcat:pv7me3oioravnn2m6wfvovbpzy
Multimodal Machine Learning: A Survey and Taxonomy
[article]
2017
arXiv
pre-print
This new taxonomy will enable researchers to better understand the state of the field and identify directions for future research. ...
It is a vibrant multi-disciplinary field of increasing importance and with extraordinary potential. ...
[114] first retrieve phrases that describe visually similar images and then combine them to generate novel descriptions of the query image by using Integer Linear Programming with a number of hand crafted ...
arXiv:1705.09406v2
fatcat:262fo4sihffvxecg4nwsifoddm
Class consistent multi-modal fusion with binary features
2015
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
We enforce this perturbation to be as small as possible via a quadratic program (QP) for continuous features, and a mixed integer program (MIP) for binary features. ...
Many existing recognition algorithms combine different modalities based on training accuracy but do not consider the possibility of noise at test time. ...
We develop this intuition into an optimization problem that can be solved via quadratic programming for continuous features, and mixed integer programming for binary features. ...
doi:10.1109/cvpr.2015.7298841
dblp:conf/cvpr/ShrivastavaRSCD15
fatcat:xyojwbucmvevdimjd7o6o77y7e
Long-Term Identity-Aware Multi-Person Tracking for Surveillance Video Summarization
[article]
2017
arXiv
pre-print
We tested our algorithm on a 23-day 15-camera data set (4,935 hours total), and we were able to localize a person 53.2% of the time with 69.8% precision. ...
Therefore, we propose a multi-person tracking algorithm for very long-term (e.g. month-long) multi-camera surveillance scenarios. ...
Many trackers have been formulated as a general Integer Linear Programming (ILP) problem. ...
arXiv:1604.07468v2
fatcat:glk4uwjsyvgo5cjvr3pclp32si
Baby talk: Understanding and generating simple image descriptions
2011
CVPR 2011
The first part, content planning, smooths the output of computer vision-based detection and recognition algorithms with statistics mined from large pools of visually descriptive text to determine the best ...
We present multiple approaches for the surface realization step and evaluate each using automatic measures of similarity to human generated reference descriptions. ...
Finally, in this paper, we formulate the optimization as an integer linear program, dictating the form of the objective function and constraints. ...
doi:10.1109/cvpr.2011.5995466
dblp:conf/cvpr/KulkarniPDLCBB11
fatcat:dow4w2mterdsbavtecuxjfd6qq
BabyTalk: Understanding and Generating Simple Image Descriptions
2013
IEEE Transactions on Pattern Analysis and Machine Intelligence
The first part, content planning, smooths the output of computer vision-based detection and recognition algorithms with statistics mined from large pools of visually descriptive text to determine the best ...
We present multiple approaches for the surface realization step and evaluate each using automatic measures of similarity to human generated reference descriptions. ...
Finally, in this paper, we formulate the optimization as an integer linear program, dictating the form of the objective function and constraints. ...
doi:10.1109/tpami.2012.162
pmid:22848128
fatcat:qhye4obzpbcllos2dr6rohli2u
Multimodal Image Outpainting with Regularized Normalized Diversification
2020
2020 IEEE Winter Conference on Applications of Computer Vision (WACV)
While recent approaches [32, 28] propose to maximize or preserve the pairwise distance between generated samples with respect to their latent distance, they do not explicitly prevent the diverse samples ...
Figure 1 : Given only a small foreground region, our model can learn to outpaint a set of diverse and plausible missing backgrounds in both face image and street scene image. ...
Acknowledgement We gratefully appreciate the support from Honda Research Institute Curious Minded Machine Program. We also gratefully acknowledge a GPU donation from NVIDIA. ...
doi:10.1109/wacv45572.2020.9093636
dblp:conf/wacv/ZhangWS20a
fatcat:4tiqyfsvvjclroesqugdbibsii
Person instance graphs for mono-, cross- and multi-modal person recognition in multimedia data: application to speaker identification in TV broadcast
2014
International Journal of Multimedia Information Retrieval
This work introduces a unified framework for mono-, cross-and multi-modal person recognition in multimedia data. ...
It relies on Integer Linear Programming to model the problem of clustering person instances based on their identity. We provide an indepth theoretical definition of the optimization problem. ...
Acknowledgements This work was partly realized as part of the Quaero Program and the QCompere project, respectively funded by OSEO (French State agency for innovation) and ANR (French national research ...
doi:10.1007/s13735-014-0055-y
fatcat:mlvyk5h5v5c4pmo4nvvljqs5ga
On support relations and semantic scene graphs
2017
ISPRS journal of photogrammetry and remote sensing (Print)
Scene understanding is a popular and challenging topic in both computer vision and photogrammetry. Scene graph provides rich information for such scene understanding. ...
In contrast to previous methods for extracting support relations, the proposed approach generates more accurate results, and does not require a pixel-wise semantic labeling of the scene. ...
Energy minimization The minimization of the energy function Eq. (5) can be formulated as an integer programming problem. ...
doi:10.1016/j.isprsjprs.2017.07.010
fatcat:suxv7piwbrg4ln24wxfv3exraq
Orientation Robust Text Line Detection in Natural Images
2014
2014 IEEE Conference on Computer Vision and Pattern Recognition
Then, higherorder correlation clustering (HOCC) is used to partition the MSERs into text line candidates, using the hypotheses as soft constraints to enforce long range interactions. ...
In this paper, higher-order correlation clustering (HOCC) is used for text line detection in natural images. ...
The original HOCC proposed a linear programming relaxation solution with a large number of inequality constraints. This complex linear system can be written elegantly in the SDP framework. ...
doi:10.1109/cvpr.2014.514
dblp:conf/cvpr/KangLD14
fatcat:jeqnmviizngmdkpa5zyfwj7sdy
« Previous
Showing results 1 — 15 out of 1,409 results