Active Learning for Unbalanced Data in the Challenge with Multiple Models and Biasing

Yukun Chen, Subramani Mani
2011 Journal of machine learning research  
The common uncertain sampling approach searches for the most uncertain samples closest to the decision boundary for a classification task. However, we might fail to find the uncertain samples when we have a poor probabilistic model. In this work, we develop an active learning strategy called "Uncertainty Sampling with Biasing Consensus" (USBC) which predicts the unbalanced data by multi-model committee and ranks the informativeness of samples by uncertainty sampling with higher weight on the
more » ... ority class. For prediction, we use Random Forests based multiple models that generate the consensus posterior probability for each sample as part of USBC. To further improve the initial performance in active learning, we also use a semi-supervised learning model that self labels predicted negative samples without querying. For more stable initial performance, we use a filter to avoid querying samples with high variance. We also introduce batch size validation to find the optimal initial batch size for querying samples in active learning.
dblp:journals/jmlr/ChenM11 fatcat:ufkim6golnfajle7ykd6cdkny4