Combining Co-Training with Ensemble Learning for Application on Single-View Natural Language Datasets

2013 Acta Polytechnica Hungarica  
In this paper we propose a novel semi-supervised learning algorithm, called Random Split Statistic algorithm (RSSalg), designed to exploit the advantages of cotraining algorithm, while being exempt from co-training requirement for the existence of adequate feature split in the dataset. In our method, co-training algorithm is run for a predefined number of times, using a different random split of features in each run. Each run of co-training produces a different enlarged training set, consisting
more » ... ing set, consisting of initial labeled data and data labeled in the co-training process. Examples from the enlarged training sets are combined in a final training set and pruned in order to keep only the most confidently labeled ones. The final classifier in RSSalg is obtained by training the base learner on a set created this way. Pruning of the examples is done by employing a genetic algorithm that keeps only the most reliable and informative cases. Our experiments performed on 17 datasets with various characteristics show that RSSalg outperforms all considered alternative methods on the more redundant natural language datasets and is comparable to considered alternative settings on the datasets with less redundancy.
doi:10.12700/aph.10.02.2013.2.10. fatcat:rnmxjpsk5nctffovkafrrtcnce