Graph transformer for cross-lingual plagiarism detection

Oumaima Hourrane, El Habib Benlahmar
2022 IAES International Journal of Artificial Intelligence (IJ-AI)  
<span lang="EN-US">The existence of vast amounts of multilingual textual data on the internet leads to cross-lingual plagiarism which becomes a serious issue in different fields such as education, science, and literature. Current cross-lingual plagiarism detection approaches usually employ syntactic and lexical properties, external machine translation systems, or finding similarities within a multilingual set of text documents. However, most of these methods are conceived for literal plagiarism
more » ... such as copy and paste, and their performance is diminished when handling complex cases of plagiarism including paraphrasing. In this paper, we propose a new graph-based approach that represents text passages in different languages using knowledge graphs. We put forward a new graph structure modeling method based on the Transformer architecture that employs precise relation encoding and delivers a more efficient way for global graph representation. The mappings between the graphs are learned both in semi-supervised and unsupervised training mechanisms. The results of our experiments in Arabic–English, French–English, and Spanish–English plagiarism detection show that our graph transformer method surpasses the state-of-the-art cross-lingual plagiarism detection approaches with and without paraphrasing cases, and provides further insights on the use of knowledge graphs on a language-independent model.</span>
doi:10.11591/ijai.v11.i3.pp905-915 fatcat:ikrbf7buwjconfxwxvtctyiqze