Regressing Local to Global Shape Properties for Online Segmentation and Tracking

Carl Yuheng Ren, Victor Prisacariu, Ian Reid
2013 International Journal of Computer Vision  
We propose a regression based learning framework that learns a set of shapes online, which can then be used to recover occluded object shapes. We represent shapes using their 2D discrete cosine transforms (DCT), and the key insight we propose is to regress low frequency harmonics, which represent the global properties of the shape, from high frequency harmonics, that encode the details of the object's shape. We learn the regression model using Locally Weighted Projection Regression (LWPR) which
more » ... expedites online, incremental learning. After sufficient observation of a set of unoccluded shapes, the learned model can detect occlusion and recover the full shapes from the occluded ones. Our shape regression method is linked to the pixel-wise posteriors (PWP) level set-based tracker of [1]. The PWP tracker obtains the target pose (a 6 DoF 2D affinity or 4 DoF 2D similarity transform) and figure/ground segmentation at each frame. We use the pose to align the shapes and then add them to the learning framework. After a burn-in period, the framework is able to recover occluded shapes at real time. We demonstrate the ideas using PWP tracker, however, the framework could be embedded in any segmentation-based tracking system. We use the DCT to represent a silhouette mask image (i.e. a binary image of the figure/ground segmentation, with 1 for foreground and -1 for background), so that the shape representation becomes a set of DCT coefficients. The transform yields a natural hierarchical representation of a shape in which the top-left, low frequency coefficients in the DCT capture the overall shape, while the high frequency coefficients (further away from top-left) capture the details of the shape. We use Locally Weighted Projection Regression (LWPR) [3] as our regression model. LWPR is based on the hypothesis that high dimensional data are characterized by locally low dimensional distribution. A learned LWPR has K local models, each comprising a Receptive Field (RF) characterized by a field center c k and a positive semi-definite distance metric D k that determines the size and shape of the neighborhood contributing to the local model; and a locally weighted partial least square (LWPLS) regression model characterized by a set of projections u k and respective their weights β k . Given a set of high frequency DCT coefficients as input x h f , the RF weight, also known as the activation, of the k th local model is computed as:
doi:10.1007/s11263-013-0635-y fatcat:xcw7nwrsbrhuxkbklbnlfc26wu