Online active inference and learning
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11
We present a generalized framework for active inference, the selective acquisition of labels for cases at prediction time in lieu of using the estimated labels of a predictive model. We develop techniques within this framework for classifying in an online setting, for example, for classifying the stream of web pages where online advertisements are being served. Stream applications present novel complications because (i) at the time of label acquisition, we don't know the set of instances that
... of instances that we will eventually see, (ii) instances repeat based on some unknown (and possibly skewed) distribution. We combine ideas from decision theory, costsensitive learning, and online density estimation. We also introduce a method for on-line estimation of the utility distribution, which allows us to manage the budget over the stream. The resulting model tells which instances to label so that by the end of each budget period, the budget is best spent (in expectation). The main results show that: (1) our proposed approach to active inference on streams can indeed reduce error costs substantially over alternative approaches, (2) more sophisticated online estimations achieve larger reductions in error. We next discuss simultaneously conducting active inference and active learning. We show that our expected-utility active inference strategy also selects good examples for learning. We close by pointing out that our utility-distribution estimation strategy can also be applied to convert pool-based active learning techniques into budget-sensitive online active learning techniques.