Syntactic and Sementic Based Similarity Measurenent for Plagiarism Detection

In the world of digital era, there is a high availability of huge amount of online documents which leads to plagiarism. Plagiarism is the act of copying other person work. The paper based documents are stored in the digital libraries for future references. In the olden days, people used the Latin word "plagiarius" to indicate the act of stealing someone else work. Plagiarism is the act of using one's ideas, concepts, words or structures without citing their references where original work is
more » ... cted from the users. In this paper, the main objective is to compare the contents of original document that matches with the contents in other documents. These matches are considered depending on the syntactic matches and also the semantic similarity. This paper employs Sentence Hashing Algorithm for Plagiarism Detection focusing on complete sentence sequences and calculates hash – sum for the sentence sequences. When the user compares the original document to several documents, if the similarity value of the document is 1, then the contents present in the original document is 100% same in the compared documents, i.e., fully plagiarized. If the similarity value varies from 0.1 to 0.9, then it is partially plagiarized. The similarity value is 0%, then the original document is unique
doi:10.35940/ijitee.a5268.129219 fatcat:e4oiwd62sjbvjmsacmx3qmf67q