Access to Unlabeled Data can Speed up Prediction Time

Ruth Urner, Shai Shalev-Shwartz, Shai Ben-David
2011 International Conference on Machine Learning  
Semi-supervised learning (SSL) addresses the problem of training a classifier using a small number of labeled examples and many unlabeled examples. Most previous work on SSL focused on how availability of unlabeled data can improve the accuracy of the learned classifiers. In this work we study how unlabeled data can be beneficial for constructing faster classifiers. We propose an SSL algorithmic framework which can utilize unlabeled examples for learning classifiers from a predefined set of
more » ... classifiers. We formally analyze conditions under which our algorithmic paradigm obtains significant improvements by the use of unlabeled data. As a side benefit of our analysis we propose a novel quantitative measure of the so-called cluster assumption. We demonstrate the potential merits of our approach by conducting experiments on the MNIST data set, showing that, when a sufficiently large unlabeled sample is available, a fast classifier can be learned from much fewer labeled examples than without such a sample.
dblp:conf/icml/UrnerSB11 fatcat:oazgr5dk2rhpniy4xgdjwmauqy