A numerical study of multiple imputation methods using nonparametric multivariate outlier identifiers and depth-based performance criteria with clinical laboratory data

Xin Dang, Robert Serfling
2011 Journal of Statistical Computation and Simulation  
It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation methods tend to impute non-extreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depthbased multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of multiple imputation as well as a new proposed one, nine in all, in a setting of several actual clinical
more » ... data sets of different dimension. Two criteria, an "outlier recovery probability" and a "relative accuracy measure", are developed, based on depth functions. Three outlier identifiers, based on Mahalanobis distance, robust Mahalanobis distance, and generalized PCA, are also included in the study. Consequently, not only the comparison of imputation methods, but also the comparison of outlier detection methods, is accomplished in this study. Our findings show that the performance of a multiple imputation method depends on the choice of depth-based outlier detection criterion, as well as the size and dimension of the data and the fraction of missing components. By taking these features into account, a multiple imputation method for a given data set can be selected more optimally.
doi:10.1080/00949650903437842 fatcat:bgs5urou7jebfdhrwuqwcn7a3m