Selecting the best faces to index presentation videos

Michele Merler, John R. Kender
2011 Proceedings of the 19th ACM international conference on Multimedia - MM '11  
We propose a system to select the most representative faces in unstructured presentation videos with respect to two criteria: to optimize matching accuracy between pairs of face tracks, and to select humanly preferred face icons for indexing purposes. We first extract face tracks using state-of-theart face detection and tracking. A small subset of images are then selected per track in order to maximize matching accuracy between tracks. Finally, representative images are extracted for each
more » ... r in order to build a face index of the video. We tested our approach on 3 unstructured presentation videos of approximately 45 minutes each, for a total of a quarter million frames. Compared to the standard min-min approach, our method achieves higher track matching accuracy (94.22%), while using 6% of the running time. Using an optimal combination of 3 user preference measures, we were able to build face indexes containing 54 speakers (out of the 58 present in the videos) indexing into 795 detected tracks.
doi:10.1145/2072298.2072040 dblp:conf/mm/MerlerK11 fatcat:hcsavuvzyzg4joboiuf7hdwlky