Gibbs Sampling Subjectively Interesting Tiles [chapter]

Anes Bendimerad, Jefrey Lijffijt, Marc Plantevit, Céline Robardet, Tijl De Bie
2020 Lecture Notes in Computer Science  
The local pattern mining literature has long struggled with the so-called pattern explosion problem: the size of the set of patterns found exceeds the size of the original data. This causes computational problems (enumerating a large set of patterns will inevitably take a substantial amount of time) as well as problems for interpretation and usability (trawling through a large set of patterns is often impractical). Two complementary research lines aim to address this problem. The first aims to
more » ... evelop better measures of interestingness, in order to reduce the number of uninteresting patterns that are returned [6, 10] . The second aims to avoid an exhaustive enumeration of all 'interesting' patterns (where interestingness is quantified in a more traditional way, e.g. frequency), by directly sampling from this set in a way that more 'interesting' patterns are sampled with higher probability [2] . Unfortunately, the first research line does not reduce computational cost, while the second may miss out on the most interesting patterns. In this paper, we combine the best of both worlds for mining interesting tiles [8] from binary databases. Specifically, we propose a new pattern sampling approach based on Gibbs sampling, where the probability of sampling a pattern is proportional to their subjective interestingness [6]-an interestingness measure reported to better represent true interestingness. The experimental evaluation confirms the theory, but also reveals an important weakness of the proposed approach which we speculate is shared with any other pattern sampling approach. We thus conclude with a broader discussion of this issue, and a forward look.
doi:10.1007/978-3-030-44584-3_7 fatcat:6djzaqymsbcbtbqlocobtd2gdi