Association Rule Mining using Maximum Entropy
Recommendations based on behavioral data may be faced with ambiguous statistical evidence. We consider the case of association rules, relevant e.g. for query and product recommendations. For example: Suppose that a customer belongs to categories A and B, each of which is known to have positive correlation with buying product C, how do we estimate the probability that she will buy product C? For rare terms or products there may not be enough data to directly produce such an estimate --- perhaps
... e never directly observed a connection between A, B, and C. What can we do when there is no support for estimating the probability by simply computing the observed frequency? In particular, what is the right thing to do when A and B give rise to very different estimates of the probability of C? We consider the use of maximum entropy probability estimates, which give a principled way of extrapolating probabilities of events that do not even occur in the data set! Focusing on the basic case of three variables, our main technical contributions are that (under mild assumptions): 1) There exists a simple, explicit formula that gives a good approximation of maximum entropy estimates, and 2) Maximum entropy estimates based on a small number of samples are provably tightly concentrated around the true maximum entropy frequency that arises if we let the number of samples go to infinity. Our empirical work demonstrates the surprising precision of maximum entropy estimates, across a range of real-life transaction data sets. In particular we observe the average absolute error on maximum entropy estimates is a factor 3--14 less compared to using independence or extrapolation estimates, when the data used to make the estimates has low support. We believe that the same principle can be used to synthesize probability estimates in many settings.