Classification and Prediction of Opinion Mining in Social Networks Data

Shaimaa Mohamed, Mahmoud Hussien, Arabi Keshk
2020 IJCI. International Journal of Computers and Information  
Opinion mining in social networks data considers one of the most significant and challenging tasks in our days due to the huge number of information distributed each day. We can profit from these opinions by utilizing two significant procedures (classification and prediction). Although there is many researchers' interest in and work at this point, it still needs improvement. Therefore, in this paper, we present a method to improve the accuracy of the classification and prediction processes. The
more » ... improvement is done through cleaning the data set by converting all words to lower case, removing usernames, mentions, links, repeated characters, numbers, delete more than two spaces between words, empty tweets, punctuations and stop words, and converting all words like "isn't" to "is not". In the feature selection phase, we use both unigrams and bigrams in order to extract the features from the data to training it. Our data set contains the user's feelings about distributed products, tweets labeled positive or negative, and each product rate from one to five. We implemented this work using different supervised machine learning algorithms like Naïve Bayes, Support Vector Machine and Max Entropy for the classification process, and Random Forest Regression, Logistic Regression, and Support Vector Regression for the prediction process. At last, we have accuracy in the classification and prediction process better than existing works. In classification, we achieved accuracy of 90% and in the prediction process, Support Vector Regression model is able to predict future product rate with a mean squared error (MSE) of 0.4122, Logistic Regression model is able to predict with a mean squared error of 0.4986 and Random Forest Regression model is able to predict with a mean squared error of 0.4770.
doi:10.21608/ijci.2020.26841.1015 fatcat:6brqj2qs3ncdxbpl4rekzjn7q4