Paraphrase Detection Based on Identical Phrase and Similar Word Matching

Hoang-Quoc Nguyen-Son, Yusuke Miyao, Isao Echizen
2015 Pacific Asia Conference on Language, Information and Computation  
Paraphrase detection has numerous important applications in natural language processing (such as clustering, summarizing, and detecting plagiarism). One approach to detecting paraphrases is to use predicate argument tuples. Although this approach achieves high paraphrase recall, its accuracy is generally low. Other approaches focus on matching similar words, but word meaning is often contextual (e.g., 'get along with,' 'look forward to'). An effective approach to detecting plagiarism would take
more » ... into account the fact that plagiarists frequently cut and paste whole phrases and/or replace several words with similar words. This generally results in the paraphrased text containing identical phrases and similar words. Moreover, plagiarists usually insert and/or remove various minor words (prepositions, conjunctions, etc.) to both improve the naturalness and disguise the paraphrasing. We have developed a similarity matching (SimM at) metric for detecting paraphrases that is based on matching identical phrases and similar words and quantifying the minor words. The metric achieved the highest paraphrase detection accuracy (77.6%) when it was combined with eight standard machine translation metrics. This accuracy is better than the 77.4% rate achieved with the state-of-the-art approach for paraphrase detection.
dblp:conf/paclic/Nguyen-SonME15 fatcat:dnvdhsgkjrd2howw36vx2fqkgi