Unsupervised discovery of visual object class hierarchies

Josef Sivic, Bryan C. Russell, Andrew Zisserman, William T. Freeman, Alexei A. Efros
2008 2008 IEEE Conference on Computer Vision and Pattern Recognition  
Objects in the world can be arranged into a hierarchy based on their semantic meaning (e.g. organism -animal -feline -cat). What about defining a hierarchy based on the visual appearance of objects? This paper investigates ways to automatically discover a hierarchical structure for the visual world from a collection of unlabeled images. Previous approaches for unsupervised object and scene discovery focused on partitioning the visual data into a set of nonoverlapping classes of equal
more » ... . In this work, we propose to group visual objects using a multi-layer hierarchy tree that is based on common visual elements. This is achieved by adapting to the visual domain the generative Hierarchical Latent Dirichlet Allocation (hLDA) model previously used for unsupervised discovery of topic hierarchies in text. Images are modeled using quantized local image regions as analogues to words in text. Employing the multiple segmentation framework of Russell et al. [22], we show that meaningful object hierarchies, together with object segmentations, can be automatically learned from unlabeled and unsegmented image collections without supervision. We demonstrate improved object classification and localization performance using hLDA over the previous non-hierarchical method on the MSRC dataset [33] .
doi:10.1109/cvpr.2008.4587622 dblp:conf/cvpr/SivicRZFE08 fatcat:hbwnebsxmfbv3ohazsjhjmotu4