Mining Documents and Sentiments in Cross-lingual Context

Motaz Saad
2016 Figshare  
The aim of this thesis is to study sentiments in comparable documents. First, we collect English, French and Arabic comparable corpora from Wikipedia and Euronews, and we align each corpus at the document level. We further gather English-Arabic news documents from local and foreign news agencies. The English documents are collected from BBC website and the Arabic document are collected from Al-jazeera website. Second, we present a cross-lingual document similarity measure to automatically
more » ... ve and align comparable documents. Then, we propose a cross-lingual sentiment annotation method to label source and target documents with sentiments. Finally, we use statistical measures to compare the agreement of sentiments in the source and the target pair of the comparable documents. The methods presented in this thesis are language independent and they can be applied on any language pair.
doi:10.6084/m9.figshare.3204040.v1 fatcat:5kb4k2kylnc7nhdumanxjw5wpe