Visual Query Expansion via Incremental Hypernetwork Models of Image and Text [chapter]

Min-Oh Heo, Myunggu Kang, Byoung-Tak Zhang
2010 Lecture Notes in Computer Science  
Humans can associate vision and language modalities and thus generate mental imagery, i.e. visual images, from linguistic input in an environment of unlimited inflowing information. Inspired by human memory, we separate a text-to-image retrieval task into two steps: 1) text-to-image conversion (generating visual queries for the 2 step) and 2) image-to-image retrieval task. This separation is advantageous for inner representation visualization, learning incremental dataset, using the results of
more » ... ontent-based image retrieval. Here, we propose a visual query expansion method that simulates the capability of human associative memory. We use a hyperenetwork model (HN) that combines visual words and linguistic words. HNs learn the higher-order cross-modal associative relationships incrementally on a set of image-text pairs in sequence. An incremental HN generates images by assembling visual words based on linguistic cues. And we retrieve similar images with the generated visual query. The method is evaluated on 26 video clips of 'Thomas and Friends'. Experiments show the performance of successive image retrieval rate up to 98.1% with a single text cue. It shows the additional potential to generate the visual query with several text cues simultaneously.
doi:10.1007/978-3-642-15246-7_11 fatcat:3dlagta7ijd3tkrhu7yuyszdrm