Human Action Recognition from Depth Videos Using Pool of Multiple Projections with Greedy Selection
IEICE transactions on information and systems
Depth-based action recognition has been attracting the attention of researchers because of the advantages of depth cameras over standard RGB cameras. One of these advantages is that depth data can provide richer information from multiple projections. In particular, multiple projections can be used to extract discriminative motion patterns that would not be discernible from one fixed projection. However, high computational costs have meant that recent studies have exploited only a small number
... ly a small number of projections, such as front, side, and top. Thus, a large number of projections, which may be useful for discriminating actions, are discarded. In this paper, we propose an efficient method to exploit pools of multiple projections for recognizing actions in depth videos. First, we project 3D data onto multiple 2D-planes from different viewpoints sampled on a geodesic dome to obtain a large number of projections. Then, we train and test action classifiers independently for each projection. To reduce the computational cost, we propose a greedy method to select a small yet robust combination of projections. The idea is that best complementary projections will be considered first when searching for optimal combination. We conducted extensive experiments to verify the effectiveness of our method on three challenging benchmarks: MSR Action 3D, MSR Gesture 3D, and 3D Action Pairs. The experimental results show that our method outperforms other state-of-the-art methods while using a small number of projections.