Deep Learning Approach for Imputation of Missing Values in Actigraphy Data: Algorithms Development Study (Preprint)

Jong-Hwan Jang, Junggu Choi, Hyun Woong Roh, Sang Joon Son, Chang Hyung Hong, Eun Young Kim, Tae Young Kim, Dukyong Yoon
2019 JMIR mHealth and uHealth  
Data collected by an accelerometer device worn on the wrist or waist can provide objective measurements for studies related to physical activity. However, some portion of the data cannot be used because of missing values. In previous studies, statistical methods have been applied to impute missing values on the basis of statistical assumptions. Deep learning algorithms, however, can learn features from the data themselves without any assumptions and may outperform previous approaches in
more » ... on tasks. The aim of this study was to impute missing values in accelerometer data using a deep learning approach that performs better than conventional approaches. To develop an imputation model for missing values in accelerometer data, a denoising convolutional autoencoder was adopted. We trained and tested our deep learning-based imputation model with the National Health and Nutrition Examination Survey (NHANES) dataset and validated it with the external Korea National Health and Nutrition Examination Survey (KNHANES) and the Korean Chronic Cerebrovascular Disease Oriented Biobank (KCCDB) datasets. The partial root mean squared error (PRMSE) and partial mean absolute error (PMAE) of the imputed parts were used for a performance comparison with previous approaches (mean imputation, zero-inflated Poisson [ZIP] regression, and Bayesian regression). Our model exhibited a PRMSE of 839.3 counts per minute (cpm) and PMAE of 431.1 cpm, whereas mean imputation showed a PRMSE of 1,053.2 cpm and PMAE of 545.4 cpm, the ZIP model achieved a PRMSE of 1,255.6 cpm and PMAE of 508.6 cpm, and Bayesian regression showed a PRMSE of 924.5 cpm and PMAE of 605.8 cpm. In this study, the proposed deep learning model for imputing missing values in accelerometer activity data performed better than the other methods.
doi:10.2196/16113 pmid:32445459 fatcat:3icve75a4faizf3qwrdosuseqe