NLP-NITMZ@DPIL-FIRE2016: Language Independent Paraphrases Detection

Sandip Sarkar, Saurav Saha, Jereemi Bentham, Partha Pakray, Dipankar Das, Alexander F. Gelbukh
2016 Forum for Information Retrieval Evaluation  
In this paper we describe the detailed information of NLP-NITMZ system on the participation of DPIL 1 shared task at Forum for Information Retrieval Evaluation (FIRE 2016). The main aim of DPIL shared task is to detect paraphrases in Indian Languages. Paraphrase detection is an important part in the field of Information Retrieval, Document Summarization, Question Answering, Plagiarism Detection etc. In our approach, we used language independent feature-set to detect paraphrases in Indian
more » ... es. Features are mainly based on lexical based similarity. Our system's three features are: Jaccard Similarity, length normalized Edit Distance and Cosine Similarity. Finally, these feature-set are trained using Probabilistic Neural Network (PNN) to detect the paraphrases. With our feature-set, we achieved 88.13% average accuracy in Sub-Task 1 and 71.98% average accuracy in Sub-Task 2.
dblp:conf/fire/SarkarSBPDG16 fatcat:rig2bbwhfjbthbn3mnx6vz4zdq