A Text Alignment Algorithm Based on Prediction of Obfuscation Types Using SVM Neural Network

Fatemeh Mashhadirajab, Mehrnoush Shamsfard
2016 Forum for Information Retrieval Evaluation  
In this paper, we describe our text alignment algorithm that achieved the first rank in Persian Plagdet 2016 competition. The Persian Plagdet corpus includes several obfuscation strategies. Information about the type of obfuscation helps plagiarism detection systems to use their most suitable algorithm for each type. For this purpose, we use SVM neural network for classification of documents according to the type of obfuscation strategy used in a document pair. Then, we set the parameter values
more » ... in our text alignment algorithm based on the detected type of obfuscation. The results of our algorithm on the test dataset and training dataset in the Persian Plagdet 2016 are shown in this article. CCS Concepts • Information systems → Near-duplicate and plagiarism detection•Text mining➝ Paraphrase detection➝ Plagiarism detection➝ Support Vector Machine.
dblp:conf/fire/MashhadirajabS16 fatcat:g5lybkuohzbmvge2sqnfu6lg2u