Classification of Proactive Personality: Text Mining Based on Weibo Text and Short-answer Questions Text

Peng Wang, Yun Yan, Yingdong Si, Gancheng Zhu, Xiangping Zhan, Jun Wang, Runsheng Pan
2020 IEEE Access  
This study focused on the topic of predicting "proactive personality". With 901 participants selected by cluster sampling method, targeted short-answer questions text and participants' social media post text (Weibo) were obtained while participants' labels of proactive personality were evaluated by experts. In order to make classification, five machine learning algorithms included Support Vector Machine (SVM), XGBoost, K-Nearest-Neighbors (KNN), Naive Bayes (NB) and Logistic Regression (LR)
more » ... deployed. Seven different indicators, which include Accuracy (ACC), F1-score (F1), Sensitivity (SEN), Specificity (SPE), Positive Predictive Value (PPV), Negative Predictive Value (NPV) and Area under Curve (AUC), combined with hierarchical cross-validation were also used to make the comprehensive evaluation of models. With participants' Weibo text and short-answer questions text, we proposed a new approach to classify individuals' proactive personality based on text mining technology. The results showed that short-answer questions + Weibo text datasets had the best performance, followed by short-answer questions text datasets, while the outcome of Weibo text datasets were the worst. However, it is noteworthy that Weibo text has the highest average score on the SPE, which indicated that Weibo text played an important role in identifying individuals with low proactive personality. With Weibo text, SEN was also improved compared with only applying short-answer questions text. In addition, among all three datasets, the indicator SPE is always higher than SEN, indicating this text classification approach was more competent for identifying college students with low proactive personality. As for algorithms, Support Vector Machine and Logistic Regression showed steadier performance compared with other algorithms.
doi:10.1109/access.2020.2995905 fatcat:uqd4sd5hkncmthkoplb63yrrfa