Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance
Proceedings of the eleventh ACM international conference on Multimedia - MULTIMEDIA '03
We present a framework for multi-camera video surveillance. The framework consists of three phases: detection, representation, and recognition. The detection phase fuses video streams from multiple cameras for efficiently and reliably extracting motion trajectories from video. The representation phase summarizes raw trajectory data to construct hierarchical, invariant, and content-rich descriptions of the motion events. Finally, the recognition phase deals with event classification and
... cation and identification on the data descriptors. Because of space limits, we describe only briefly how we detect and represent events, but we provide in-depth treatment on the third phase: event recognition. For effective recognition, we devise a sequence-alignment kernel function to perform sequence data learning for identifying suspicious events. We show that when the positive training instances (i.e., suspicious events) are significantly outnumbered by the negative training instances (benign events), then SVMs (or any other learning methods) can suffer a high incidence of errors. To remedy this problem, we propose the kernel boundary alignment (KBA) algorithm to work with the sequence-alignment kernel. Through empirical study in a parkinglot surveillance setting, we show that our spatio-temporal fusion scheme and biased sequence-data learning method are highly effective in identifying suspicious events.