A Comparative Study of Machine Translation for Multilingual Sentence-level Sentiment Analysis

Matheus Araújo, Adriano Pereira, Fabrício Benevenuto
2019 Information Sciences  
Sentiment analysis has become a key tool for several social media applications, including analysis of user's opinions about products and services, support for politics during campaigns and even for market trending. Multiple existing sentiment analysis methods explore different techniques, usually relying on lexical resources or learning approaches. Despite the significant interest in this theme and amount of research efforts in the field, almost all existing methods are designed to work with
more » ... y English content. Most current strategies in many languages consist of adapting existing lexical resources, without presenting proper validations and basic baseline comparisons. In this work, we take a different step into this field. We focus on evaluating existing efforts proposed to do language specific sentiment analysis with a simple yet effective baseline approach. To do it, we evaluated sixteen methods for sentence-level sentiment analysis proposed for English, comparing them with three language-specific methods. Based on fourteen human labeled language-specific datasets, we provide an extensive quantitative analysis of existing multi-language approaches. Our primary results suggest that simply translating the input text on a specific language to English and then using one of the existing best methods developed to English can be better than the existing language specific efforts evaluated. We also rank methods according to their prediction performance and we identified the methods that acquired the best results using machine translation across different languages. As a final contribution to the research community, we release our codes, datasets, and the iFeel 3.0 system, a web framework for multilingual sentence-level sentiment analysis. We hope our system setups a new baseline for future sentence-level methods developed in a wide set of languages.
doi:10.1016/j.ins.2019.10.031 fatcat:tle7kqohzrgdji54qpgsqi2b5i