Multivariate Data Quality Enhancement by Ranked Imputation

2020 VOLUME-8 ISSUE-10, AUGUST 2019, REGULAR ISSUE  
Organizational decisions are based on data-based-analysis and predictions. Effective decisions require accurate predictions, which in-turn depend on the quality of the data. Real time data is prone to inconsistencies, which exhibit negative impacts on the quality of the predictions. This mandates the need for data imputation techniques. This work presents a prediction-based data imputation technique, Rank Based Multivariate Imputation (RBMI) that operates on multivariate data. The proposed
more » ... is composed of the ranking phase and the imputation phase. Ranking dictates, the attribute order in which imputation is to be performed. The proposed model utilizes tree-based approach for the actual imputation process. Experiments were performed on Pima, a diabetes dataset. The data was amputed in range between 5% - 30%. The obtained results were compared with existing state-of-the-art models in terms of MAE and MSE levels. The proposed RBMI model exhibits a reduction of 0.03 in MAE levels and 0.001 in MSE levels.
doi:10.35940/ijitee.c9027.019320 fatcat:reupha25qbhnlpiwe4odfozwzm