Towards a comprehensive computational model foraesthetic assessment of videos
Proceedings of the 21st ACM international conference on Multimedia - MM '13
In this paper we propose a novel aesthetic model emphasizing psychovisual statistics extracted from multiple levels in contrast to earlier approaches that rely only on descriptors suited for image recognition or based on photographic principles. At the lowest level, we determine dark-channel, sharpness and eye-sensitivity statistics over rectangular cells within a frame. At the next level, we extract Sentibank features (1, 200 pre-trained visual classifiers) on a given frame, that invoke
... c sentiments such as "colorful clouds", "smiling face" etc. and collect the classifier responses as framelevel statistics. At the topmost level, we extract trajectories from video shots. Using viewer's fixation priors, the trajectories are labeled as foreground, and background/camera on which statistics are computed. Additionally, spatio-temporal local binary patterns are computed that capture texture variations in a given shot. Classifiers are trained on individual feature representations independently. On thorough evaluation of 9 different types of features, we select the best features from each level -dark channel, affect and camera motion statistics. Next, corresponding classifier scores are integrated in a sophisticated low-rank fusion framework to improve the final prediction scores. Our approach demonstrates strong correlation with human prediction on 1, 000 broadcast quality videos released by NHK as an aesthetic evaluation dataset.