A Rule-Based Classification Algorithm for Uncertain Data

Biao Qin, Yuni Xia, Sunil Prabhakar, Yicheng Tu
2009 Proceedings / International Conference on Data Engineering  
Data uncertainty is common in real-world applications due to various causes, including imprecise measurement, network latency, outdated sources and sampling errors. These kinds of uncertainty have to be handled cautiously, or else the mining results could be unreliable or even wrong. In this paper, we propose a new rule-based classification and prediction algorithm called uRule for classifying uncertain data. This algorithm introduces new measures for generating, pruning and optimizing rules.
more » ... ese new measures are computed considering uncertain data interval and probability distribution function. Based on the new measures, the optimal splitting attribute and splitting value can be identified and used for classification and prediction. The proposed uRule algorithm can process uncertainty in both numerical and categorical data. Our experimental results show that uRule has excellent performance even when data is highly uncertain.
doi:10.1109/icde.2009.164 dblp:conf/icde/QinXPT09 fatcat:6kykz5liwbb2pgrqgrd6qdzk6u