Support Estimation in Frequent Itemset Mining by Locality Sensitive Hashing

Annika Pick, Tamás Horváth, Stefan Wrobel
2019 Lernen, Wissen, Daten, Analysen  
The main computational eort in generating all frequent itemsets in a transactional database is in the step of deciding whether an itemset is frequent, or not. We present a method for estimating itemset supports with two-sided error. In a preprocessing step our algorithm rst partitions the database into groups of similar transactions by using locality sensitive hashing and calculates a summary for each of these groups. The support of a query itemset is then estimated by means of these summaries.
more » ... Our preliminary empirical results indicate that the proposed method results in a speed-up of up to a factor of 50 on large datasets. The F-measure of the output patterns varies between 0.83 and 0.99.
dblp:conf/lwa/Pick0W19 fatcat:aka6cmx6ejhdlibjiqwqpa2u5a