Bayesian Multiscale Multiple Imputation With Implications for Data Confidentiality

Scott H. Holan, Daniell Toth, Marco A. R. Ferreira, Alan F. Karr
2010 Journal of the American Statistical Association  
Many scientific, sociological, and economic applications present data that are collected on multiple scales of resolution. One particular form of multiscale data arises when data are aggregated across different scales both longitudinally and by economic sector. Frequently, such datasets experience missing observations in a manner that they can be accurately imputed, while respecting the constraints imposed by the multiscale nature of the data, using the method we propose known as Bayesian
more » ... cale multiple imputation. Our approach couples dynamic linear models with a novel imputation step based on singular normal distribution theory. Although our method is of independent interest, one important implication of such methodology is its potential effect on confidential databases protected by means of cell suppression. In order to demonstrate the proposed methodology and to assess the effectiveness of disclosure practices in longitudinal databases, we conduct a large-scale empirical study using the U.S. Bureau of Labor Statistics Quarterly Census of Employment and Wages (QCEW). During the course of our empirical investigation it is determined that several of the predicted cells are within 1% accuracy, thus causing potential concerns for data confidentiality.
doi:10.1198/jasa.2009.ap08629 fatcat:rsmwzmfrsjbg7ptipmnlkwhgwq