Sentiment Classification for Mexican Tourist Reviews based on K-NN and Jaccard Distance

Alejandra Romero-Cantón, Ramón Aranda
2021 Annual Conference of the Spanish Society for Natural Language Processing  
In this paper is presented a propose solution to the Sentiment Analysis challenge presents in the Recommendation System for Text Mexican Tourism task during the Iberian Languages Evaluation Forum 2021. The task consists of predicting the polarity of an opinion issued by a tourist who traveled to the most representative places of Guanajuato, Mexico. Thus, our approach is based K-Nearest Neighbors by using a distance based on the Jaccard coefficient concept. In the training stage, by using the
more » ... ining data, our approach first clusters every word from every opinion (review) by the respective class. Then, the stop words from each cluster are deleted. After, the normalized frequency of each word in a class is computed. In this way, the set of words (trained words) with their normalized frequency (NF) are used as class feature vector. In the classification stage, when a new opinion is given, each word from it is intersect with the trained words for each class and the NF of the intersected words are summed (dissimilarity value). The predicted class is assigned to the class with the most high dissimilarity value. The performance on the testing data were of 1.26 MAE and 0.22 of F-measure. We think that the obtained results are because the data is unbalanced and our approach does not deal with that issue.
dblp:conf/sepln/Romero-CantonA21 fatcat:dl4zkfcngzfkvms2bl4e5rze2m