From pose to activity: Surveying datasets and introducing CONVERSE

Michael Edwards, Jingjing Deng, Xianghua Xie
2016 Computer Vision and Image Understanding  
We present a review on the current state of publicly available datasets within the human action recognition community; highlighting the revival of pose based methods and recent progress of understanding person-person interaction modeling. We categorize datasets regarding several key properties for usage as a benchmark dataset; including the number of class labels, ground truths provided, and application domain they occupy. We also consider the level of abstraction of each dataset; grouping
more » ... that present actions, interactions and higher level semantic activities. The survey identifies key appearance and pose based datasets, noting a tendency for simplistic, emphasized, or scripted action classes that are often readily definable by a stable collection of subaction gestures. There is a clear lack of datasets that provide closely related actions, those that are not implicitly identified via a series of poses and gestures, but rather a dynamic set of interactions. We therefore propose a novel dataset that represents complex conversational interactions between two individuals via 3D pose. 8 pairwise interactions describing 7 separate conversation based scenarios were collected using two Kinect depth sensors. The intention is to provide events that are constructed from numerous primitive actions, interactions and motions, over a period of time; providing a set of subtle action classes that are more representative of the real world, and a challenge to currently developed recognition methodologies. We believe this is among one of the first datasets devoted to conversational interaction classification using 3D pose
doi:10.1016/j.cviu.2015.10.010 fatcat:dr6riozhe5bsjfz5iqzcpizufm