Analysis and Evaluation of Two Feature Selection Algorithms in Improving the Performance of the Sentiment Analysis Model of Arabic Tweets

Maria Yousef, Abdulla ALali
2022 International Journal of Advanced Computer Science and Applications  
Recently, Sentiment analysis from Twitter is one of the most interesting research disciplines; it combined data mining technologies with natural language processing techniques. The sentiment analysis system aims to evaluate the texts that are posted on social platforms to express positive, negative, or neutral feelings of people regarding a certain domain. The high dimensionality of the feature vector is considered to be one of the most popular problems of Arabic sentiment analysis. The main
more » ... tribution of this paper is to solve the dimensionality problem by presenting a comparative study between two feature selection algorithms, namely, Information Gain (IG), and Chi-Square to choose the best one which may lead to improve the classification accuracy. In this paper, the Arabic Jordanian sentiment analysis model is proposed through four steps. First, a preprocessing step has been applied to the database and includes (Remove Non-Arabic Symbols, Tokenizing, Arabic Stop Word Removal, and Stemming). In the second step, the TF-IDF algorithm is used as a feature extraction method to represent the text into feature vectors. Then, we utilized IG and Chi-Square as feature selection steps to obtain the best subset of features and decrease the total number of features. Finally, different algorithms have been used in the classification step such as (SVM, DT, and KNN) to classify the views people have shared on Twitter, into two classes (positive, and negative). Several experiments were performed on Jordanian dialectical tweets using the AJGT database. The experimental results show the following: 1) The information acquisition algorithm outperformed the Chi-Square Algorithm in the feature selection step, as it was able to reduce the number of features from 1170 to 713 and increase the accuracy of the classifiers by 10%, 2) SVM classifier shows the greatest classification performance among all the classifiers tested which gives the highest accuracy of 85% with IG algorithm.
doi:10.14569/ijacsa.2022.0130683 fatcat:ksmefvgtdbfufmvbauppysvuhe