Understanding images via visual similarity and deep feature representations

Mai Lan Ha, Universitätsbibliothek Siegen
Machine Learning and Computer Vision are often thought to relate only to machines, involving the development of algorithms and teaching computers to perform various tasks. However, human vision and perception are hidden aspects that influence how an algorithm should function, or how we would want a computer to "see". The two goals of this thesis are the study of perceptual visual similarity and that of feature representations from Deep Convolutional Neural Networks (DCNNs). Assessing visual
more » ... larity in-the-wild, a core ability of the human visual system is a challenging problem for Computer Vision because of its subjective nature and its ambiguity in the problem definition. Therefore, the first goal of the thesis is to study the fundamental problems of visual similarity. We raise the question if we could break down different aspects of similarity that make their study more tractable and computationally feasible. We study color composition similarity in-depth, from human evaluation to its modeling using DCNNs. We apply the models to create a new global color similarity descriptor and color transfer method. We then couple color composition and category similarities to define a new model for visual similarity. The combination leads to better results in fine-grained image retrieval. Our approach is a proof of concept, showing that we can make subjective phenomena scientifically tractable. We also developed a perceptual-inspired metric to evaluate intrinsic imaging methods resulting in a fairer evaluation compared to previous metrics. The second goal of the thesis focuses on investigating what features are embedded in different parts of a DCNN, how we could use them efficiently, and how we can improve these features. On the one hand, the low to mid-level features, ranging from image pixels to different layers of convolutional responses in a DCNN, are used in perceptual metrics and visual similarity. On the other hand, we discover shape information "hidden" in the high-level features of a DCNN trained for classificat [...]
doi:10.25819/ubsi/7398 fatcat:lymdzicaqbas7ncljmziovmsiq