KEC@DPIL-FIRE2016: Detection of Paraphrases in Indian Languages (Tamil)

R. Thangarajan, S. V. Kogilavani, A. Karthic, S. Jawahar
2016 Forum for Information Retrieval Evaluation  
This paper presents a report on Detecting Paraphrases in Indian Languages (DPIL), in particular the Tamil language, by the team NLP@KEC of Kongu Engineering College. Automatic paraphrase detection is an intellectual task which has immense applications like plagiarism detection, new event detection, etc. Paraphrase is defined as the expression of a given fact in more than one way by means of different phrases. Paraphrase identification is a classic natural language processing task which is of
more » ... ssification type. Though there are several algorithms for paraphrase identification, reflecting the semantic relations between the constituent parts of a sentence plays a very important role. In this paper we utilize sixteen different features to best represent the similarity between sentences. The proposed approach utilizes machine learning algorithms like Support Vector Machine and Maximum Entropy for classification of given sentence pair. They have been classified into Paraphrase and Not-a-Paraphrase for task1 and Paraphrase, Not-a-Paraphrase and Semi-Paraphrase for task2. The accuracy and performance of these methods are measured on the basis of evaluation parameters like accuracy, precision, recall, f-measure and macro f-measure. Our methodology got 2 nd place in DPIL evaluation track.
dblp:conf/fire/ThangarajanKKJ16 fatcat:zlhrcmqzlrd2jhcj45hiif6ad4