Noisy Active Learning from a Bayesian Perspective

Benjamin Rupprechter
2012
We approach the problem of active learning from a Bayesian perspective, working with a probability distribution over the solution space. In addition to the classical active selection of data points, we formulate the construction of minimum decision trees from noisy datasets as an active learning task. Building on the OASIS algorithm, we compare active learning score functions based on the EC 2 criterion with uncertainty sampling, two GBS approaches, and random selection by performing
more » ... on several standard datasets. While constituting a unique approach in its original online setting, according to our findings, the OASIS algorithm does not generally offer performance benefits in a classical offline setting. Furthermore, for active learning from noisy data samples, we introduce a new intrinsic EC 2 -based stopping criterion and show that in many cases, it outperforms a standard information gain based method paired with the χ 2 -test on the decision tree learning problem. In particular, our criterion enables the effective construction of small decision trees, and may provide the first effective method to learn a close-to-minimal tree classifier with bounded expected error rates from noisy data samples. i My parents, Josef and Madeleine Rupprechter, deserve special thanks along with the rest of my family for their ongoing support, without which the writing of this thesis would have been impossible. Last but not least, a big thank you to the two loves of my life, Maya Minwary and Jesus Christ.
doi:10.3929/ethz-a-007305105 fatcat:vreg65rmijdibate5t7zbj5znq