The effectiveness of homogenous ensemble classifiers for Turkish and English texts

Zeynep Hilal Kilimci, Selim Akyokus, Sevinc Ilhan Omurca
2016 2016 International Symposium on INnovations in Intelligent SysTems and Applications (INISTA)  
Text categorization has become more and more popular and important problem day by day because of the large proliferation of documents in many fields. To come up with this problem, several machine learning techniques are used for categorization such as naïve Bayes, support vector machines, artificial neural networks, etc. In this study, we concentrate on ensemble of multiple classifiers instead of using only a single one. We perform a comparative analysis of the impact of the ensemble techniques
more » ... ensemble techniques for text categorization domain. To carry out this, the same type of base classifiers but diversified training sets are used which is referred as homogenous ensembles. In order to diversify the training dataset, various ensemble algorithms are utilized such as Bagging, Boosting, Random Subspace and Random Forest. Multivariate Bernoulli Naïve Bayes is preferred as a base classifier due to its superior classification performance compared to the success of the other single classifiers. A wide range of comparative and extensive empirical studies are conducted on four widely-used datasets in text categorization domain in both Turkish and English. Finally, the effectiveness of ensemble algorithms is discussed.
doi:10.1109/inista.2016.7571854 dblp:conf/inista/KilimciAO16 fatcat:gg2xmmikhrbufmylbklspkrcfi