DensEst: Density Estimation for Data Mining in High Dimensional Spaces [chapter]

Emmanuel Müller, Ira Assent, Ralph Krieger, Stephan Günnemann, Thomas Seidl
2009 Proceedings of the 2009 SIAM International Conference on Data Mining  
Subspace clustering and frequent itemset mining via "stepby-step" algorithms that search the subspace/pattern lattice in a top-down or bottom-up fashion do not scale to large high dimensional data bases. Recent "jump" algorithms directly choose candidate subspace regions or patterns. Their scalability and quality depend heavily on the rating of these candidates as mislead jumps incur poor results and costly candidate refinements. Existing techniques rely on simple statistics with low estimation
more » ... with low estimation quality or on inefficient data base scans. In this work, we propose DensEst, an efficient density estimator with significantly improved accuracy. It efficiently provides rough estimates of object counts in selective subspace regions. Furthermore, by incorporating correlations between dimensions DensEst achieves not only efficient but also highly accurate estimations. We show how this density estimation technique can be easily integrated into subspace clustering and frequent itemset mining algorithms to improve both their efficiency and accuracy. We demonstrate the performance of our density estimation technique in thorough experiments and show its efficiency and accuracy improvement for existing algorithms. Since R has cardinality n, the induction hypothesis yields: We use the intersection rule of conditional inde- B∪C |X D with A = {l}, B = {m}, C = R , and D = {k} ∪ (R\R ): X {l} ⊥X {m}∪R |X {k}∪(R\R ) ⇔ X {l} ⊥X R |X {k}∪(R\R )
doi:10.1137/1.9781611972795.16 dblp:conf/sdm/MullerAKGS09 fatcat:cvdjqxwobbe2ni3ywq7jcamjka