Using GAN-based models to sentimental analysis on imbalanced datasets in education domain [article]

Ru Yang, Maryam Edalati
2021 arXiv   pre-print
While the whole world is still struggling with the COVID-19 pandemic, online learning and home office become more common. Many schools transfer their courses teaching to the online classroom. Therefore, it is significant to mine the students' feedback and opinions from their reviews towards studies so that both schools and teachers can know where they need to improve. This paper trains machine learning and deep learning models using both balanced and imbalanced datasets for sentiment
more » ... ion. Two SOTA category-aware text generation GAN models: CatGAN and SentiGAN, are utilized to synthesize text used to balance the highly imbalanced dataset. Results on three datasets with different imbalance degree from distinct domains show that when using generated text to balance the dataset, the F1-score of machine learning and deep learning model on sentiment classification increases 2.79% ~ 9.28%. Also, the results indicate that the average growth degree for CR100k is higher than CR23k, the average growth degree for deep learning is more increased than machine learning algorithms, and the average growth degree for more complex deep learning models is more increased than simpler deep learning models in experiments.
arXiv:2108.12061v1 fatcat:pkzclefb5bgwfh4pwwovcmti4u