HIT2016@DPIL-FIRE2016: Detecting Paraphrases in Indian Languages based on Gradient Tree Boosting

Leilei Kong, Kaisheng Chen, Liuyang Tian, Zhenyuan Hao, Zhongyuan Han, Haoliang Qi
2016 Forum for Information Retrieval Evaluation  
Detecting paraphrase is an important and challenging task. It can be used in paraphrases generation and extraction, machine translation, question and answer and plagiarism detection. Since the same meaning of a sentence is expressed in another sentence using different words, it makes the traditional methods based on lexical similarity ineffective. In this paper, we describe a strategy of Detecting Paraphrases in Indian Languages, which is a workshop track proposed by Forum Information Retrieval
more » ... Evaluation 2016. We formalize this task as a classification problem, and a supervised learning method based on Gradient Boosting Tree is utilized to classify the types of paraphrase plagiarism. Inspired by the Meteor evaluation metrics of machine translation, the Meteor-like features are used for the classifier. Evaluation shows the performance of our approach, which achieved the highest Overall Score (0.77), the highest F1 measure for both Task1 and Task2 on Malayalam and Tamil, and the highest F1 measure on Punjabi Task2 in the 2016 FIRE Detecting Paraphrase in Indian Languages task.
dblp:conf/fire/KongCTHHQ16 fatcat:6tudpxqwurh3joi4ndeuoqoqsm