A Comparison of Oversampling Methods on Imbalanced Topic Classification of Korean News Articles

Yirey Suh, Cheongtag Kim, Leegu Song, Jaemyung Yu, Jonghoon Mo
2017 Journal of Cognitive Science  
Machine learning has progressed to match human performance, including the field of text classification. However, when training data are imbalanced, classifiers do not perform well. Oversampling is one way to overcome the problem of imbalanced data and there are many oversampling methods that can be conveniently implemented. While comparative researches of oversampling methods on non-text data have been conducted, studies comparing oversampling methods under a unifying framework on text data are
more » ... scarce. This study finds that while oversampling methods generally improve the performance of classifiers, similarity is an important factor that influences the performance of classifiers on imbalanced and resampled data.
doi:10.17791/jcs.2017.18.4.391 fatcat:5mqa6rz6jffn7mlsbwf4o3sg4u