Video Scene Understanding Using Multi-scale Analysis

Yang Yang, Jingen Liu, Mubarak Shah
2009 2009 IEEE 12th International Conference on Computer Vision  
We propose a novel method for automatically discovering key motion patterns happening in a scene by observing the scene for an extended period. Our method does not rely on object detection and tracking, and uses low level features, the direction of pixel wise optical flow. We first divide the video into clips and estimate a sequence of flow-fields. Each moving pixel is quantized based on its location and motion direction. This is essentially a bag of words representation of clips. Once a bag of
more » ... words representation is obtained, we proceed to the screening stage, using a measure called the 'conditional entropy'. After obtaining useful words we apply Diffusion maps. Diffusion maps framework embeds the manifold points into a lower dimensional space while preserving the intrinsic local geometric structure. Finally, these useful words in lower dimensional space are clustered to discover key motion patterns. Diffusion map embedding involves diffusion time parameter which gives us ability to detect key motion patterns at different scales using multi-scale analysis. In addition, clips which are represented in terms of frequency of motion patterns can also be clustered to determine multiple dominant motion patterns which occur simultaneously, providing us further understanding of the scene. We have tested our approach on two challenging datasets and obtained interesting and promising results.
doi:10.1109/iccv.2009.5459376 dblp:conf/iccv/YangLS09 fatcat:jwgqcvcsnvdupo3omq2ahmbmdy