On the regularization of image semantics by modal expansion

J. C. Pereira, N. Vasconcelos
2012 2012 IEEE Conference on Computer Vision and Pattern Recognition  
Recent research efforts in semantic representations and context modeling are based on the principle of task expansion: that vision problems such as object recognition, scene classification, or retrieval (RCR) cannot be solved in isolation. The extended principle of modality expansion (that RCR problems cannot be solved from visual information alone) is investigated in this work. A semantic image labeling system is augmented with text. Pairs of images and text are mapped to a semantic space, and
more » ... the text features used to regularize their image counterparts. This is done with a new cross-modal regularizer, which learns the mapping of the image features that maximizes their average similarity to those derived from text. The proposed regularizer is class-sensitive, combining a set of class-specific denoising transformations and nearest neighbor interpolation of text-based class assignments. Regularization of a state-ofthe-art approach to image retrieval is then shown to produce substantial gains in retrieval accuracy, outperforming recent image retrieval approaches.
doi:10.1109/cvpr.2012.6248041 dblp:conf/cvpr/PereiraV12 fatcat:pelvzgtmxne73ehsfruhchiqgy