Online Cross-Modal Adaptation for Audio–Visual Person Identification With Wearable Cameras

Alessio Brutti, Andrea Cavallaro
2016 IEEE Transactions on Human-Machine Systems  
We propose an audio-visual target identification approach for egocentric data with cross-modal model adaptation. The proposed approach blindly and iteratively adapts the timedependent models of each modality to varying target appearance and environmental conditions using the posterior of the other modality. The adaptation is unsupervised and performed on-line, thus models can be improved as new unlabelled data become available. In particular, accurate models do not deteriorate when a modality
more » ... underperforming because of an appropriate selection of the parameters in the adaptation. Importantly, unlike traditional audio-visual integration methods, the proposed approach is also useful for temporal intervals during which only one modality is available or when different modalities are used for different tasks. We evaluate the proposed method in an end-to-end multi-modal person identification application with two challenging real-world datasets and show that the proposed approach successfully adapts models in presence of mild mismatch. We also show that the proposed approach is beneficial to other multi-modal score fusion algorithms.
doi:10.1109/thms.2016.2620110 fatcat:goygftl3dfdcbmpvk2vuzbu5mi