Semi-supervised learning from unbalanced labeled data: An improvement

Te-Ming Huang, Vojislav Kecman
2006 Journal of Knowledge-based & Intelligent Engineering Systems  
We present a possibly great improvement while performing semisupervised learning tasks from training data sets when only a small fraction of the data pairs is labeled. In particular, we propose a novel decision strategy based on normalized model outputs. The paper compares performances of two popular semi-supervised approaches (Consistency Method and Harmonic Gaussian Model) on the unbalanced and balanced labeled data by using normalization of the models' outputs and without it. Experiments on
more » ... ext categorization problems suggest significant improvements in classification performances for models that use normalized outputs as a basis for final decision.
doi:10.3233/kes-2006-10102 fatcat:ee66v62l45a7pedmigrpc2zydm