Multi Modal RGB D Action Recognition with CNN LSTM Ensemble Deep Network

D. Srihari, P. V.
2020 International Journal of Advanced Computer Science and Applications  
Human action recognition has transformed from a video processing problem into multi modal machine learning problem. The objective of this work is to perform multi modal human action recognition on an ensemble hybrid network of CNN and LSTM layers. The proposed CNN -LSTM ensemble network is a 2 -stream framework with one ensemble stream learning RGB sequences and the other depth. This proposed framework can learn both temporal and spatial dynamics in both RGB and depth modal action data. The
more » ... id network is found to be receptive towards both spatial and temporal fields because of the hierarchical structure of CNNs and LSTMs. Finally, to test our proposed model, we used our own BVCAction3D and three RGB D benchmark action datasets. The experiments were conducted on all the datasets using the proposed framework and was found to be effective when compared to similar deep learning architectures.
doi:10.14569/ijacsa.2020.0111284 fatcat:h63esrv6pfhljkzt7xdy6ygypa