Text-to-image retrieval based on incremental association via multimodal hypernetworks

Jung-Woo Ha, Beom-Jin Lee, Byoung-Tak Zhang
2012 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC)  
Text-to-image retrieval is to retrieve the images associated with the textual queries. A text-to-image retrieval model requires an incremental learning method for its practical use since the multimodal data grow up dramatically. Here we propose an incremental text-to-image retrieval method using a multimodal association model. The association model is based on a hypernetwork (HN) where a vertex corresponds to a textual word or a visual patch and a hyperedge represents a higherorder multimodal
more » ... sociation. Using the HN incrementally learned by a sequential Bayesian sampling, in the multimodal hypernetwork-based text-to-image retrieval, a given text query is crossmodally expanded to the visual query and then similar images are retrieved to the expanded visual query. We evaluated the proposed method using 3,000 images with textual description from Flickr.com. The experimental results present that the proposed method achieves very competitive retrieval performances compared to a baseline method. Moreover, we demonstrate that our method provides robust text-to-image retrieval results for the increasing data.
doi:10.1109/icsmc.2012.6378291 dblp:conf/smc/HaLZ12 fatcat:a6bxtihwnncarfinib6e3hgpne