Learning articulated appearance models for tracking humans: A spectral graph matching approach

Nicolas Thome, Djamel Merad, Serge Miguet
2008 Signal processing. Image communication  
Tracking an unspecified number of people in real-time is one of the most challenging tasks in computer vision. In this paper, we propose an original method to achieve this goal, based on the construction of a 2D human appearance model. The general framework, which is a region-based tracking approach, is applicable to any type of object. We show how to specialize the method for taking advantage of the structural properties of the human body. We segment its visible parts by using a skeletal graph
more » ... matching strategy inspired by the shock graphs. Only morphological and topological information is encoded in the model graph, making the approach independent of the pose of the person, the viewpoint, the geometry or the appearance of the limbs. The limbs labeling makes it possible to build and update an appearance model for each body part. The resulting discriminative feature, that we denote as an articulated appearance model, captures both color, texture and shape properties of the different limbs. It is used to identify people in complex situations (occlusion, field of view exit, etc.), and maintain the tracking. The model to image matching has proved to be much more robust and better-founded than with existing global appearance descriptors, specifically when dealing with highly deformable objects such as humans. The only assumption for the recognition is the approximate viewpoint correspondence between the different models during the matching process. The method does not make use of skin color detection, which allows us to perform tracking under any viewpoint. Occlusions can be detected by the generic part of the algorithm, and the tracking is performed in such cases by means of a particle filter. Several results in complex situations prove the capacity of the algorithm to learn people appearance in unspecified poses and viewpoints, and its efficiency for tracking multiple humans in real-time using the specific updated descriptors. Finally, the model provides an important clue for further human motion analysis process.
doi:10.1016/j.image.2008.09.003 fatcat:7plyxv5tmra2nkfulh3gzrndhu