Exploring Locally Rigid Discriminative Patches for Learning Relative Attributes

Yashaswi Verma, CV Jawahar
2015 Procedings of the British Machine Vision Conference 2015  
Figure 1 : Approach overview: Given a test pair, first its patch-based representation is computed. Then using this representation, its analogous training pairs are identified. These pairs are used to learn a (local) ranking function, which is finally used for relative attribute prediction ("smiling" in this above illustration). Relative attributes help in comparing two images based on their visual properties [4] . These are of great interest as they have been shown to be useful in several
more » ... related problems such as recognition, retrieval, and understanding image collections in general. In the recent past, quite a few techniques (such as [3, 4, 5, 6] ) have been proposed for the relative attribute learning task that give reasonable performance. However, these have focused either on the algorithmic aspect or the representational aspect. In this work, we revisit these approaches and integrate their broader ideas to develop simple baselines. These not only take care of the algorithmic aspects, but also take a step towards analyzing a simple yet domain independent patch-based representation [1] for this task. Given an image, we compute HOG descriptors from non-overlapping square patches and concatenate them. This basic representation efficiently captures local shape in an image, as well as spatially rigid correspondences across regions in an image pair. The motivation behind using this for the relative attribute learning task is the observation that images in several domain-specific datasets (such as shoes and faces) are largely aligned, and spatial variations in the regions of interest are globally minimal ( Figure 2 ). We integrate this representation with two state-of-the-art approaches: (i) "Global" [4] that learns a single, globally trained ranking model (Ranking SVM [2]) for each attribute, and (ii) "LocalPair" [6] that uses a ranking model trained locally using analogous training pairs for each test pair. Its another variant, "LocalPair+ML", uses a learned distance metric while computing the analogous pairs. The motivation behind the LocalPair approach is that as visual differences within an image-pair become more and more subtle, a single prediction model trained using the whole dataset may become inaccurate. This is because it captures only the coarse details, and smoothens the fine-grained properties. This approach proposes to consider only the few training pairs for each test pair that are most analogous to it. These can be thought of as the K training pairs that are most similar to the given test pair. In LocalPair+ML, a learned distance metric is used to give more importance to those feature dimensions that are more representative of a particular attribute while computing the analogous pairs. Using the identified pairs, both LocalPair and Local-Pair+ML learn a local (specific to the given test pair) ranking model similar to [4] . Note that the "Global" approach can be thought of as a special case of the LocalPair approach where K is the total number of training pairs, and thus all of them are considered while learning a ranking model. This is illustrated in Figure 1 . We refer the above baselines as Global+Hog, LocalPair+Hog and Lo-calPair+ML+Hog. These baselines are extensively evaluated on three challenging relative attribute datasets: OSR (natural outdoor scenes), LFW-10 (faces) and UT-Zap50K (shoes). While comparing with previous works, we use the representations used by them (wherever applicable). Table 1 summarizes the quantitative results. We can observe that the baselines achieve promising results on the OSR and LFW-10 datasets, and perform better than the current state-of-the-art on the UT-Zap50K dataset (note that UT-Zap50K-2 dataset with fine-grained within-pair visual dif-ferences is the most challenging among these datasets). For detailed comparisons, please refer to the paper.
doi:10.5244/c.29.170 dblp:conf/bmvc/VermaJ15 fatcat:axbraoz3tzagbacmhlpuezqke4