Formative Evaluation of Consumer-Grade Activity Monitors Worn by Older Adults: Test-Retest Reliability and Criterion Validity of Step Counts (Preprint)
JMIR Formative Research
Step counts are widely used as a measure of physical activity. To assess if commercial-grade activity monitors are appropriate for measuring step counts in the growing population of older adults, it is essential to evaluate their reliability and validity in this population. To evaluate test-retest reliability and criterion validity of step counting in older adults with self-reported intact and limited mobility from six commercial-grade activity monitors: Fitbit Charge, Fitbit One, Garmin
... One, Garmin vívofit 2, Jawbone UP2, Misfit Shine, and New-Lifestyles NL-1000. For test-retest reliability, participants completed two 100-step over-ground walks at usual pace while wearing all monitors. We tested the effects of activity monitor and mobility status on absolute difference in step count error (%) between repeat trials. We also computed the standard error of measurement (SEM) between repeat trials. To assess criterion validity, participants completed two 400-metre over-ground walks at usual pace while wearing all monitors. The first walk was continuous, the second walk incorporated interruptions designed to mimic conditions of daily walking. Criterion step counts were obtained by researcher tally count. We estimated the effects of activity monitor, mobility status, and walk interruptions on step count error (%). We also generated Bland-Altman plots and conducted equivalence tests. Thirty-six individuals participated (n=20 intact mobility, n=16 limited mobility, 53% female) with mean (SD) age 71.4 (4.7) years and body mass index 29.4 (5.9) kg/m2. Considering test-retest reliability, there was an effect of activity monitor (P<.001). The One (1.0%, 95CI: 0.6% to 1.3%), NL-1000 (2.6%, 95CI: 1.3% to 3.9%), and vívofit 2 (6.0%, 95CI: 3.2% to 8.8%) had the smallest mean absolute differences in step count errors. SEM values ranged from 1.0% (One) to 23.5% (UP2). Regarding criterion validity, all monitors undercounted steps. Step count error was affected by activity monitor (P<.001) and walk interruptions (P=.02). Three monitors had small mean step count errors, the Shine (-1.3%, 95CI:-19.5 to 16.8%), One (-2.1%, 95CI:-6.1 to 2.0%), and NL-1000 (-4.3%, 95CI:-18.9 to 10.3%). Mean step count error was larger during interrupted than continuous walking (-5.5% vs. -3.6%, P=.02). Bland-Altman plots illustrated non-systematic bias and small limits of agreement across the range of observed step counts only for the One and UP2. Mean step count error lay within an equivalence bound of ±5% for the One (P<.001) and Shine (P=.001). Test-retest reliability and criterion validity of step counting varied across six consumer-grade activity monitors when worn by older adults with self-reported intact and limited mobility. Walk interruptions increased step count error for all monitors, while self-reported mobility limitation did not affect step count error. The hip-worn Fitbit One was the only monitor that exhibited high test-retest reliability and criterion validity.