Modeling and Imputation of Large Incomplete Multidimensional Datasets [chapter]

Xintao Wu, Daniel Barbará
2002 Lecture Notes in Computer Science  
The presence of missing or incomplete data is a commonplace in large realword databases. In this paper, we study the problem of missing values which occur at the measure dimension of data cube. We propose a two-part mixture model, which combines the logistic model and loglinear model together, to predict and impute the missing values. The logistic model here is applied to predict missing positions while the loglinear model is applied to compute the estimation. Experimental results on real datasets and synthetic datasets are presented.
doi:10.1007/3-540-46145-0_28 fatcat:gjqqlznqbjf7veqm6wuzyazfum