An Efficient Compression Algorithm for Uncertain Databases Aimed at Mining Problems
Lecture Notes on Software Engineering
Many studies on association rule mining have focused on item sets from precise data in which the presence and absence of items in transactions was certainly known. In some applications, the presence and absence of items in transactions are uncertain and the knowledge discovered from this type of data will extracted with approximation manner. Data compression offers a good solution to reduce data size that can save the time of discovering useful knowledge. In this paper we suggest a new
... to compress transactions from uncertain database based on modified version of M 2 TQT (Mining Merged Transactions with the Quantification Table) approach and fuzzy logic concept. The algorithm bands the uncertain data to set of clusters using K-Mean algorithm and exploits fuzzy membership function to classify the transaction items as one of those clusters. Finally, the modified version of M 2 TQT has been employed to compress the classified transactions. The key idea of our algorithm is that since uncertain data is probabilistic in nature and frequent item set is counted as expected values so, compressed transactions will give us approximate values for the item set's support. Experimental results show that the proposed algorithm is better than U-Apriori algorithm in case of large uncertain database. Index Terms-Rule mining, database compression, Uncertain database, fuzzy logic.